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PROCESS FOR THE DEVELOPMENT OF BINDING MINI-PROTEINS 

BACKGROUND OP THE INVENTION 

f rt of tthe In vention 
5 This invention relates to development of novel binding 

mini -proteins, and especially micro -proteins, by an 
iterative process of mutagenesis, expression, affinity 
selection, and amplification. In this process, a gene 
encoding a mini-protein potential binding domain, said gene 
10 being obtained by random mutagenesis of a limited number of 
predetermined codons, is fused to a genetic element which 
causes the resulting chimeric expression product to be 
displayed on the outer surface of a virus (especially -a 
filamentous phage) or a cell. Affinity selection is then 
15 used to identify viruses or cells whose genome includes 
such a fused gene which coded for the protein which bound to 
the chromatographic target . 
np^rnntW " f fche Related Art 

The amino acid sequence of a protein determines its 
20 three-dimensional (3D) structure, which in turn determines 
protein function. Some residues on the polypeptide chain 
are more important than others in determining the 3D 
structure of a protein, and hence its ability to bind, non- 
covalently, but very tightly and specifically, to 
25 characteristic target molecules. ; 

"Protein engineering" is the art of manipulating the 
sequence of a protein in order, e_j^, to alter its binding 
characteristics. The factors affecting protein binding are 
known, but designing new complementary surfaces has proved 
30 difficult. Quiocho fit aL (QUI087) suggest it is unlikely 
that, using current protein engineering methods, proteins 
j can be constructed with binding properties superior to those 

# of proteins that occur naturally. 

Nonetheless, there have been some isolated successes. 

35 For example, Wilkinson fit sLLu (WIUC84) reported that a 
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mutant of the tyrosyl tRNA synthetase of Basilic 
c ^ oaT •n^hPr^ ln nhil U 3 with the mutation Thr 5I -->Pro exhibits a 
100- fold increase in affinity for ATE. 

With the development of recombinant DNA techniques, it 
5 became possible to obtain; a mutant protein by mutating. the 
gene encoding the native protein and then expressing the mu- 
tated gene. Several. mutagenesis strategies are known. One, 
"protein surgery" (DILL87) ," involves the introduction of one . 
or more r ~^t-. e rmined mutations within the gene of choice. 
10 A single, polypeptide of completely predetermined sequence is 
expressed, and its binding, characteristics are evaluated 

At the other extreme is random mutagenesis by means of 
relatively • nonspecif ic mutagens ' such as " radiation and 
various chemical agents. ' See Ho .*£ (HOCJ85). and 

±5 Lehtovaara, EP Appln. 285,123. 

- It is possible to rcmdomly vary predetermined nucleo- 
tides using a mixture of bases in the appropriate cycles of 
a nucleic acid synthesis procedure. (0LIP86, OLIP87) The 
proportion of bases in the mixture, for each position of a 
20 codon, will determine the frequency at which each amino acid 
will occur in the polypeptides expressed from the degenerate 
DNA population. (REIDSSa; VERS86a; VERSSSb) . The problem 
- of unequal abundance of DNA encoding different amino adds 

is not discussed. 
25 F erenci and collaborators have published - a series of 

papers on the chromatographic isolation of mutants of the 
maltose- transport protein LamB of L. fifili (FERE82a, FERE 8 2b, 
■ FERES 3 > FERE84, CLUN84, HEIN87 and papers cited therein) . 
The mutants were either, spontaneous or induced with nonspe- 
30 cific chemical mutagens. Levels of mutagenesis were picked 
to provide single point mutations or single insertions of 
two residues. No multiple mutations were sought or found. 

While variation was seen in the degree of affinity for 
the conventional LamB substrates maltose and starch, there 
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was no selection for affinity to a target molecule not bound 
at all by native LamB,. and no multiple mutations were sought 
or found. FERE 8 4 speculated that the affinity chromato- 
graphic selection technique could be adapted to development 
of similar mutants of other -important bacterial surface- 
located enzymes", and to selecting for mutations which 
result in the relocation of ■ an intracellular bacterial 
protein to the cell surface, Ferenci -s mutant surface 
proteins would not, however, have been chimeras of a 
bacterial surface protein and an exogenous or heterologous 

binding domain. 

Ferenci also taught that there was no need to clone the 
structural gene,, or to know the protein structure, active 
site, • or sequence. The method of the present invention, 
however, specifically utilizes a cloned structural gene. It 
is not possible to construct and express a chimeric,, outer 
surface -directed potential binding protein- encoding gene 

without cloning. 

. Ferenci did not limit the mutations to particular loci 
Substitutions were limited by the nature of the mutagen 
father than by the desirability of particular amino acid 
types at a particular site. In the present invention, 
knowledge of the protein structure, active site and/or 
sequence is used as appropriate to predict which residues 
25 are most, likely to affect binding activity without unduly 
destabilizing the protein, and the mutagenesis is focused 
upon those sites. Ferenci does not suggest * that surface 
residues should be preferentially varied. In consequence , 
Ferenci; s selection system is much less efficient than that 

30 disclosed herein. 

A number of researchers have directed unmutafced foreign 
antigenic epitopes to the surface of bacteria or phage, 
fused to a native bacterial or phage surface protein, and 
demonstrated that the epitopes were recognized by ant ibod- 

35 ies. Thus, Charbit, et al. (CHAR86a,b) genetically inserted 



20 



WO 92/15677 PCT/US92/01456 



: the C3 epitope of the VP1 coat protein of .poliovirus ; into 
: the LamB outer membrane protein of 2*1. S3l±, and determined 
immunologically, that the C3 epitope was. exposed on the 
bacterial cell surface. Charbit, et al. (CHAR87) . likewise 
5 produced chimeras of LamB and the A (or B) epitopes of the 
preS2 region of hepatitis- B virus. ~>'- : ' \ ~ - 1 

. A chimeric' LacZ/OmpB protein has been expressed in 
coliand'is, depending on the fusion, directed. to either the 
outer membrane or the periplasm (SILH77) ... A / chimeric 
10 ' LacZ/OmpA surface protein has also been .expressed and 
displayed on the surface of E^fiQli cells (WEIN83) . Others 
have expressed ; and • displayed- on the. surface of a cell 
.chimeras of other bacterial surf ace .proteins ,/ such as EL_; 
" £oii- type 1 fimbriae {HEDE89 ) «nrT R*rt-grioides nodusus type 
15 i fimbriae (JENN89) .; . In none of the .recited, cases was .the 
inserted genetic material mutagenized. ... 

Diiibecco (DULB86) • suggests a procedure for .incor- 
porating a foreign antigenic epitope into a viral surface 
• protein so that the expressed chimeric protein is displayed 
20 on the , surface of the virus in a manner such that the 
foreign epitope is accessible to antibody. In 1985 Smith 
(SMIT85) reported inserting a nonfunctional segment of the 
ScoRI endonuclease gene into gene III. of bacteriophage fl, 
■ ■ • "in phase". The gene III! protein is a minor coat .protein 
25 ~ necessary . for infect ivity. Smith Remonstrated : that the 
J . recombinant phage were adsorbed by. immobilized antibody 
raised against the IsoRI endonuclease , and- could be eluted 
with acid. : De la Cruz fit (DELA88) have expressed a 

: fragment of the, repeat region of -the circumsporozoite 
3 o . protein from Plasmodium falciparum on the surface of M13 as 
an insert in the gene III protein. . They showed that the 
recombinant phage were both antigenic and immunogenic in 
rabbits," and that such recombinant phage could be used for 
B epitope mapping • The researchers suggest that similar 
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recombinant phage coald be used, for T epitope mapping and 
for vaccine development. 

None of these researchers suggested mutagenesis of the 
inserted material,, nor is. the inserted material a complete 
5 binding domain conferring on. the . chimeric protein- the 
ability to bind specifically. to a receptor other than the 
antigen combining site of an antibody. 

• Mccaf f erty £t aJU (MCCA90) expressed a fusion of an Fv 
fragment of an antibody to the N- terminal of the pIII 
10 protein. The Fv fragment was not mutated. 

Parmley and. Smith (PARM88) suggested that an epitope 
library that exhibits all possible hexapeptides could be 
• constructed and used to isolate epitopes that bind to 
• antibodies. In .discussing the epitope. library, the authors 
did hot suggest that it was. desirable to balance the 
representation of different amino acids. Nor did they teach 
that the insert should encode a complete domain of the 
exogenous protein. Epitopes are considered to be unstruc- 
tured peptides as opposed to structured proteins. 

Scott and Smith (SCOT90) and Cwirla st (CWIR90) 
prepared -epitope libraries- in which potential hexapeptide 
epitopes for a . target antibody were randomly mutated by 
fusing degenerate oligonucleotides, encoding the epitopes, 
with gene III of fd phage, and expressing the fused gene in 
phage- infected cells. The cells manufactured fusion phage 
which" displayed the epitopes on their surface; the phage 
which bound to immobilized antibody were eluted with acid 
and studied. In both cases, the fused gene featured a 
segment encoding a spacer region to separate the variable 
region from the wild type pIII sequence so that the varied 
amino acids would not be constrained by the nearby pIII 
sequence. Devlin ei aJU (DEVL90) similarly screened, using 
M13 phage, for random. 15 residue epitopes recognized by 
streptavidin. Again, a spacer was used to move the random 
peptides away from the rest of the chimeric phage protein. 
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These references therefore taught away from constraining the 
conformational repertoire of the imitated residues. 

Another problem with the Scott and Smith, Cwirla et 
al -and Devlin et al . , libraries was that they provided a 
5 highly biased sampling of the .possible amino acids at each 
position.; Their primary concern iri designing the degenerate 
oligpnuc^otide encoding theirjvariable regi on was to ensure 
that all twenty amino acids were encodible at each position; 
a secondary consideration was minimi z ing . the frequency of 

10 occurrence of stop signals. Consequently, Scott and Smith 
and Cwirla et al. employed NNK (N»egual mixture of G, A, T, 
C; K=equal mixture of G and. T) while Devlin et. al. used NNS 
(S=equal mixture of G and C) . There was no , attempt to 
minimize the frequency ratio of most favored- to -least 

15 favored amino acid, or to equalize the rate of occurrence of 
acidic and basic amino acids. 

Devlin g£. al. characterized several affinity- selected 
streptavidin- binding peptides, but did not; measure the 
affinity constants for these peptides. Cwirla et al . did 

20 ■ ... determine the affinity constant for his" peptides, but were 
. disappointed to find that his best hexapeptides had affini- 
ties (350-300nM) , "orders of magnitude" weaker than that of 
the native Met -enkephalin epitope (7nM) recognized by the 
target antibody. Cwirla £t al . speculated that phage 

25 V bearing peptides with higher affinities remained bound under 
a.cidic elution, possibly because of multivalent interactions 
between phage ( carrying about 4 copies of pill) and the 
divalent target IgG. Scott and Smith were able to find 
peptides whose affinity for the target antibody- (A2) was 

30 comparable to that of the reference - myohemerythrin epitope 
(50nM) . However, Scott and Smith likewise expressed concern 
that some high-affinity peptides were lost, possibly through 
irreversible binding of fusion phage to target. 

Lam, et al. (LAM91) created' a pentapeptide library by 

35 ; nonbiological synthesis on solid supports. While they teach. 
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that it is desirable to obtain the universe of possible 
random pentapeptides in roughly equimolar proportions, they 
deliberately excluded cysteine, to eliminate any possibility 
of disulfide cross linking. 
5 Ladner, Glick, and Bird, WO88/06630 (publ. 7 Sept. 1988 

and having priority from US application 07/021,046, assigned 
to Genex Corp.) (LGB) speculate that diverse single chain 
antibody domains (SCAD) may be screened for. binding to a 
particular antigen by - varying the DNA encoding the combining 

10 determining regions of a single chain antibody, subcloning 
the SCAD gene into the gpV gene of phage X so that a 
SCAD/gpV chimera is displayed on the outer surface of phage 
X, and selecting phage which bind to the antigen through 
affinity chromatography . • The only .antigen mentioned is 

15 ' bovine growth hormone . No other binding molecules , targets , 
carrier organisms, or outer surface proteins are discussed. 
Nor is there any mention of the method or degree of 
mutagenesis. Furthermore, there is no teaching as to the 
exact structure of the fusion nor of how to identify a 

20 ' successful fusion or how to, proceed if the SCAD is not 
displayed. 

Ladner and Bird, W088/ 066 01 (publ. 7 September 1988) 
suggest that single chain "pseudodimeric" repressors (DNA- 
bihding proteins) may be prepared by mutating a putative 

25 linker peptide followed by In vivo selection that mutation 
and selection may. be used to create a dictionary of recogni- 
tion elements for use in. the design of asymmetric repres- 
sors . The repressors are not displayed on the outer surface 
of an organism. * 

30 ; Methods of identifying residues in protein which can be 

replaced with a cysteine in order to promote the formation 
of a protein- stabilizing disulfide bond are given in 
Pantoliano and Ladner, U.S. Patent No. 4,903,773 (PANT90) , 
. Pantoliano and Ladner (PANT87) , Pabo and Suchenek (PAB086) , 

35' MATS 89, and SAUE8 6 . 
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Ladner, al. . - WQ90/02809 • describes semirandom 

mutagenesis ("variegation") -of known proteins, displayed as 
dbmairis of semiartif icial outer surface proteins of 
bacteria, phage or spores, and affinity selection of mutants 
5 having desired binding characteristics . The. smallest 
proteins specifically^ mentioned in W090/02809 are crambin 
(3:40, 4:32, 16:26 disulfides; 46 AAs ) , the third domain of 
7by3muM±d' : " 56. AAs) , and 

BPTI (5:55, 14:38, 30:51 disulfides; 58 AAs) V , W090/02809 
10 * also- specifically describes a strategy for "variegating" a 
codon to obtain a mix of all - twenty amino acids" at that 
/position in approximately equal- proportions . 

Bass, et al. (BASS90) fused human growth hormone 1 to the 
gene, III protein of M13 , phage . He suggested that . hGH and 
15 other "large proteins " might .be mutated and "binding 
selections" applied. ; -V - / 

SUMMARY OF THE INVENTION 
A polypeptide is a. polymer composed of a single chain 
of the same . or different amino acids joined^ by peptide 
20 bonds . Linear peptides can take up a! very large number of 
different conformations through internal rotations about the 
main chain single bonds of ,each a carbon. These rotations 
are hindered to varying, degrees by side groups, with glycine 
interfering the least, and; valine, isoleucine and, ;especial- 
25 ? ly, proline, the most. A polypeptide of 20 residues may 
. have 10 20 dif ferent conformations which it may , assume by- 
various internal rotat ions . - . 

Proteins are polypeptides which,- as a result of 
stabilizing interactions between amino acids that are not 
30 necessarily in adjacent positions in the chain, have folded 
into a well-defined conformation. This folding is usually 
essential to their biological activity. 

For polypeptides of ' 40r60 : residues . or longer, 
noncovalent forces such as hydrogen bonds, salt bridges, and 



WO 92/15677 



PCT/ US92/0 1456 



hydrophobic .interactions are sufficient to stabilize a 
particular folding or conformation. The polypeptide's 
constituent segments are held to more or less that conforma- 
tion unless it . is perturbed by a denaturant such as high 
5 temperature, or low or high pH, whereupon the polypeptide 
unfolds or "melts" ... The smaller the peptide , ; the more 
likely it is that its conformation will be determined by the 
environment. If a small unconstrained peptide has biologi- 
cal activity, the peptide ligand will be in essence a random 
10 coil until.it comes into proximity with its receptor. The 
receptor accepts the peptide only in one or a few conforma- 
tions because alternative conformations are disfavored by 
unfavorable van der Waals and other non-covalent interac- 

'■' tions. •. - ■ .- .. , . .- „-■• ■ • 

15 Small, polypeptides have potential . advantages over 

larger polypeptides when used as therapeutic or diagnostic 
agents, including (but not limited to): 

a) better penetration into tissues, 

b) faster elimination from the circulation (important for 
20 - imaging agents) , 

c) lower antigenicity, and „ 

d) higher activity per mass. „ 
Moreover, polypeptides, especially those of less than 

about 40 residues, .have the advantage of accessibility ylk 
25 chemical synthesis; polypeptides of . under about .30 residues 
aire particularly preferred. Thus, it would be desirable to 
be able to employ the combination of mutation and affinity 
selection to identify small polypeptides which bind a target 
of choice. 

30 Most polypeptides of this size, however, have disadvan- 

tages as binding molecules. According to Olivera fit al^. 
(OLIV90a) : "Peptides in this size range normally equilibrate 
among many conformations (in order to have a fixed 
conformation, proteins generally have to be much larger) . - 

35 Specific binding of a peptide to a target molecule requires 
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the peptide to take up one conformation that is 
complementary to the binding site . For a decapeptide with 
three isoenergetic' conformations (£^£1, strand, a helix, 
. and reverse turn) at each residue, there are about 6.-10 4 
5 . possible, overall conformations. Assuming these conforma- 
., tioiis tc/be equi -probable for the unconstrained decapeptide, 
if only one of the possible conformations bound to the 
binding site, then the affinity of the peptide for the 
target would be expected to be about 6- 10 4 higher if it 
10 could be constrained to that single effective conformation. 
* i. Thus, the unconstrained decapeptide, relative to a 
. .. ; .deGapeptide constrained to the correct conformation, would 
"be expected to exhibit lower affinity. It would, also 
exhibit lower specificity, since one of the other, confor- 
15 mat ions of the unconstrained decapeptide might be one which 
bound tightly to a material other than the intended target . 
By way of corollary, it could have less resistance to 
degradation by proteases , since it would be more likely to 
- provide a binding site for the protease. 

The present invention overcomes these problems ,. while 
retaining the advantages of smaller polypeptides, by 
identifying novel mini -proteins having the desired binding 
characteristics. Mini- Proteins are small polypeptides 
which, while too small to have a stable conformation as a 
result of noncovalent forces alone, are covalently 
. . crosslinked ( e.g. . by disulfide bonds), into a stable 
conformation and hence have biological activities more 
typical of larger protein molecules than of unconstrained 
polypeptides of comparable size. THe mini-proteins with 
30 which the present invention is particularly concerned fall 
into two categories : (a) disulfide -bonded micro -proteins of 
less than 40 amino acids ; and (b) metal ion- coordinated 
mini -proteins of less than 60 amino acids . . 
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The present invention relates to the construction, 
expression, and selection of mutated genes that specify 
hovel mini -proteins with desirable binding" properties, as 
well as these mini -proteins themselves, and the "libraries" 
5 of mutant "genetic packages" used to display the ' mini - 
proteins to a potential "target" material. The "targets" 
may be, but need not be, proteins. Targets may include 
other biological or synthetic macromolecules as well as 
other organic and inorganic substances. 

10- • The prior application, W090/02809 generally teaches 

that stable protein domains may be mutated in order to 
identify new ;■ proteins . with desirable binding 
• characteristics. Among the suitable "parental" proteins 
which it specif ically. identifies as useful for this purpose 

15 - are three proteins- -BPTI (58 residues) , the third domain of 
ovomucoid (56 residues) , and crambin (46 residues) - -which 

- -f 

are in the size range of 40-60 residues wherein noncovaleht 
interactions between nonadjacent amino acids become 
significant; all three also contain three disulfide bonds 
20 that enhance the stability of the molecule. 

Nowhere in W090/02809 does one find any specific 
recognition that a polypeptide with less than 40 residues, 
and especially those with only one or two disulfide bonds, 
would have sufficient stability to serve as a "scaffolding" 
25 • for mutational variation. These "micro-proteins" are, 
nonetheless, of great utility, as previously indicated. 

WO90/02809 also suggests the use of a protein, azurin, 
having a different form of crosslink (Cu:CYS,HIS,HIS,MET) . 
However, azurin has 128 amino acids, so it cannot possibly 
30 be considered a mini-protein. The present invention 

relates to the use of mini -proteins of less than 60 amino 
- acids which feature a- metal ion- coordinated crosslink. 

By virtue of the present invention, proteins are 
obtained which can bind specifically to targets other than 
35 the antigen- combining sites of antibodies. A protein is not 
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to be, considered a "binding protein" merely because it can 
be bound by an antibody (see def inition of "binding protein" 
which follows) • . While almost any amino acid sequence of 
more than about 6-8 amino acids is likely, when linked to an 
5 > immunogenic carrier/ to elicit an immune response, any given 
random polypeptide is unlikely to satisfy the stringent 
, definition of ^binding; Iproteiii^ : with -respect to idjiimum ; 
affinity and specificity; for its substrate. It is only by 
testing numerous random polypeptides simultaneously (and, in 
10 the usual case, controlling the extent and character of the 
sequence variation, i.e. . limiting it to residues of a 
potential binding domain having a stable structure, the 
■ . residues, being chosen as more likely to affect binding 'than 
; stability) that this, obstacle is overcome. " 
15 The appended claims are hereby incorporated by; refer- 

ence into this specification as an enumeration of the 
preferred embodiments. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows the main chain of scorpion toxin (Brookhaven 
Protein Data Bank entry 1SN3) residues 20 through 42. 
CYS25 and CYS 41 are shown forming a disulfide. . In the 
native protein these groups form disulfides to other 
cysteines, but no main- chain motion is required to 
bring the gamma sulphurs into acceptable geometry. 
Residues , other than GLY, are labeled at the jS carbon 
with the one- letter code. 

DETAILED DESCRIPTION OF THE PREFERRED WMttonTMrgyprp 
30 . I. INTRODUCTION V : n ; : 

The fundamental principle of the invention is one of 
forced evolution . In nature, evolution results from the 
combination of genetic variation, selection for advantageous 
traits, and reproduction of the selected individuals, 
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thereby enriching the population for the trait. The present 
invention achieves genetic variation through controlled 
random mutagenesis (" variegation ") of DNA, yielding a 
mixture of DNA molecules encoding different hut related 
5 potential binding domains that are mutants of micro- 
proteins. It selects for mutated genes that specify novel 
- proteins with desirable binding properties by 1) - arranging 
that the product of each mutated gene be displayed oh the 
outer surf ace of a replicable genetic package (GP) (a cell, 
LO spore . or virus) that contains the gene, and 2) using 
affinity - selection -- selection for binding to the target 
material -.- to enrich the population of packages for those 
packages containing genes specifying proteins with improved 
binding to that/target material. Finally, enrichment is 
15 achieved by allowing only the genetic packages which, by 
virtue of the displayed protein, bound to the target, to. 
reproduce. The evolution is "forced" in that selection is 
for the target material provided and in that particular 
codons are mutagenized at higher- than- natural frequencies. 
20 The display strategy is first perfected by modifying a 

genetic package to display a stable, structured domain (the 
■■h. tH Mai, -potenti al binding domain" , IPBD) for which an 
affinity molecule (which may be an antibody) is obtainable. 
The success of. the modifications is readily measured by, 
25 • e.g. . determining whether the modified genetic package binds 
to. the affinity molecule. •* 
The IPBD is chosen with a view to its tolerance for 
extensive mutagenesis. Once it is known "that the IPBD can 
be displayed on a surface of a package and subjected to 
30 affinity selection, the gene encoding the IPBD is subjected 
to : a special pattern of multiple mutagenesis, here termed 
" variegation " . which after appropriate cloning and amplifi- 
cation steps leads to the production of a population of 
•genetic packages each of which displays a single potential 
35 binding domain (a mutant of the IPBD) , but which collective- 
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ly display a multitude of different though - structurally 
related potential binding domains (PBDs). : . Each genetic 
package carries the version of the pM gene that encodes the 
PBD displayed on the surface of that particular package. 
Affinity, selection is then used to identify' the . genetic 
packages bearing the PBDs with* the desired binding charac- 
teristics, and these genetic packages may then be amplified.; 
After one or more cycles of enrichment by affinity selection 
•and amplification, the DNA encoding the successful binding 
domains (SBDs) may then be recovered from selected packages. 

If need, be, the DNA from the SBD -bearing packages may 
then be further -variegated", using an SBD of the last round 
of. variegation as the "parental potential binding domain" 
(PPBD) "to the next generation of PBDs , and the process 
continued until the worker in the art is satisfied with the ■ 
result. Because of the structural and evolutionary 
relationship between the. IPBD and -the first generation of 
PBDs , the . IPBD is also/ considered a "parental potential 

binding domain" (PPBD) . , . ' 

When micro-proteins are variegated, the residues which 
are covalently crosslinked in the parental molecule are lef t 
•unchanged, thereby 'stabilizing the- .For 
example, in the variegation of a disulfide bonded micro- 
protein; certain cysteines are invariant so that under the 
conditions. of expression and display, covalent. crosslinks: 
t e i Q . , disulfide bonds between one or more ; pairs of 
cysteines) form, and. substantially constrain the conforma- 
tion which may be adopted by the hypervariable linearly 
intermediate .amino acids.. In other words, a constraining 
scaffolding '/is.;, engineered into polypeptides ,• which are 
otherwise extensively randomized. 

Once a micrb-proteiri of desired binding .characteristics 
is "characterized, it may be produced, not only by 
recombinant DNA techniques , but also by nonbiological 
35 synthetic methods. 
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For- the purposes of the appended claims , a protein P is 
a "binding protein" if for at least one molecular, ionic or 
atdmic species A, other than the variable domain of an 
antibody, the dissociation .constant K D (P/A) <• 10 . 
5 moles/liter- (preferably, ; < 10" 7 moles/liter) . : 

, . . • T he exclusion of "variable domain of an antibody" in 
(l) above is intended . to make clear that for the; purposes 
; herein a protein is not. to be considered a "binding protein" 
merely because it is antigenic . 
10 Most larger proteins fold into distinguishable globules 

called domains. (ROSS81) . Protein domains have been defined 
various ways; definitions of ^domain" which emphasize 
stability -- retention of the overall structure in the face 

of perturbing forces- such as elevated temperatures or 

15 , cnabtrbpic-agents —. are. favored, ..though atomic coordinates 
' and' protein sequence homology are not completely ignored. 

When a domain of a protein is primarily responsible for 
the protein's ability to specif ically bind a chosen target, 
it is referred to herein as a "binding domain" ( BD ) . 
20 The term "variegated DNA" (vgDNA) refers to a mixture 

of DNA molecules of the same or similar length which, when 
aligned, vary at some codons so as to encode at each such 
codon a plurality of different amino acids , but which encode 
only a single amino acid at other codon positions. It is 
further understood that in variegated DNA, the codons which 
are variable, and the range and frequency of occurrence of 
the different - amino acids which a given variable codon 
encodes , are determined in advance by the synthesizer of the 
DN A, even though the synthetic method does not allow one to 
know, a priori, the sequence of any individual DNA molecule 
in the mixture . The number of designated variable codons in 
the variegated DNA is preferably no more than 20 codons, and 
more preferably no more than 5-10 codons . The mix of amino 
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acids encoded at each variable codon may differ . from codon 
to codpn. 

A population of genetic packages into which variegated 
DNA has been introduced is likewise said to be "variegated" . 
5 For the purposes of this invention, the term "potential 

binding protein" (PBP) refers to a protein encoded by, one 
species of DNA molecule in a population of variegated DNA 
• wherein " the " region of variation appears in one or more 
subsequences encoding one or more segments of the polypep- 
10 tide having the potential of serving as a binding domain- f or 
the target substance. , - 

' A "chimeric protein" is a fusion of a first amino acid 
"sequence (protein) with a second amino . acid sequence 
: defining^ a domain foreign to : and not, substantially 
15 homologous with any domain of the first protein. A chimeric 
protein may present a foreign domain which is .found (albeit 
in a different protein) in an organism which also expresses 
the. first protein, or it may be an "interspecies", 
"intergeneric", etc. fusion of protein structures expressed 

20 by different kinds of organisms . 

One amino acid sequence of the chimeric proteins of the 
present invention is typically derived from an; outer surface 
protein of a "genetic package" (GP) as hereafter defined. 
One which displays a PBD on its surf ace is a GP (PBD) . The 

25 : ' second amino. acid. sequence; is one which, if expressed. alone,, 
would have the characteristics of a . protein (or a domain 
thereof ) but is incorporated into the chimeric protein as a 
recognizable domain thereof.- It may. appear at the amino or 
carboxy terminal of the first amino acid sequence (with or 

30 without an intervening spacer) , or. it ( may interrupt the 
first amino acid sequence. The first amino acid ..sequence 
\ may correspond exactly to a surface protein of the, genetic 
' ' package, or it may be modified,, e^, to . facilitate the 
display of the binding domain. 
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II. MICRO* AND OTHER MINI -PROTEINS 

in the present invention, disulfide bonded micro- 
- proteins and metal - containing mini -proteins are used both 
as iPBDs in verifying a display strategy, and as PPBDs in 
5 actually seeking, to obtain a BD with the desired target- 
binding characteristics. unless otherwise stated or 
required by context, references herein to. IPBDs should be 
taken to apply, mut**-" 1 " mutandis, to PPBDs as well. 

For the purpose of the appended claims, a micro-protein 
10 has between about six and about forty residues; micro- 
proteins are a subset of mini -proteins, which have less than . 
about sixty residues. Since micro-proteins form a subset of 
mini -proteins, for convenience the term mini -proteins will 
be used on occasion to refer to both disulfide -bonded micro- 
15 "' proteins and metal -coordinated mini - pro te ins . 

The IPBD may be a mini -protein with a known binding 
activity, or one which, while not possessing a known binding 
activity, possesses a secondary or higher structure that 
lends itself to binding activity (clefts, grooves, S£c_J, . 
20 When the IPBD does have a known binding activity, it need 
not have any specific affinity for the target material . The 
' IPBD need not be identical in sequence to a naturally- 
occurring mini -protein; it may be a "homologue" with an 
amino acid sequence which "substantially corresponds" to 
25 that of a known minirprotein, or it may . be wholly 
. artificial. 

In determining whether sequences should be deemed to 
"substantially correspond",., one should- consider the 
following issues: the degree of sequence similarity when the 
30 sequences are aligned for best fit according to standard 
algorithms, the similarity in the connectivity patterns of 
'any crosslinks ( e.a-. . disulfide bonds) , the degree to which 
the proteins have similar three-dimensional structures, as 
indicated by, e.g. . X-ray diffraction analysis or NMR, and 
35 the degree to which the sequenced proteins have similar 
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biological activity. In this . context, it should, be noted 
that among the serine protease inhibitors , there are 
• families of proteins recognized to be homologous in which 
. there are pairs of members with as little as .30% sequence 

5 ; homology. ; \ : " . 

A candidate IPBD should meet the following criteria: 
1) a domain exists that will remain stable under the . 
, ~ conditions of its intended use (the domain niay 

comprise • the entire protein that will be inserted, 
10 : e.g. V-conotoxin" GI (OLIV90a) , or CMTI-III (MCWH89) , 

'.f 2) knowledge of the amino acid sequence is. obtainable , 

and . -.' ' r - •. 

3) a molecule is obtainable having specific and high 
affinity for the IPBD , abbreviated Af M (IPBD) .. . 
If only one species of molecule; having affinity for 
IPBD (AfM( IPBD) ) is available, it will be used to: a) detect 
the IPBD on the GP surface, b) optimize expression level and 
density of the affinity molecule on the matrix, and c) 
determine the efficiency and sensitivity of the affinity- 
separation. One would prefer to have available two species 
• of AfM(iPBD) , one with high and one with moderate affinity 
for the IPBD. The species with high affinity would be used 
in initial detection and in determining efficiency and 
sensitivity, and the species with moderate affinity would be 
25 used in optimization / 

■ if the IPBD is not itself a known binding protein, or 
if its. native target has not been purified, an antibody 
. raised against 'the IPBD may be used as the affinity 
molecule . Use of an antibody for this purpose should not. be 
30 . , taken to mean that the antibody is the ultimate target. 
: , : ' : There are many candidate IPBDs for which all of the 

above information is available or is reasonably practical to 
.. . obtain, for. example, CMTI-III (29 residues) (CMTI- type 
inhibiters are described in 0TLE8 7, FAVE89, WIEC85, MCWH89, 
BODE89, HOLA89a,b) , heat-stable enterotoxin (ST- la of E^. 
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coli ) (18 residues) (€UAR89, BHAT8 6 , SEKI85, SHIM87, TAKA85, 
TAKE90, THOM85a,b, YOSH85, DALL90, DWAR89, GARI87, GUZM89 , 
GUZM90, H0UG84, KUB089, KUPE9 0 , 0KAM87, 0KAM8 8 , MUD 0KAM9 0 ) , 
a-Conotoxin GI (13 residues) (HASH85 , ALMQ89) , /i-Cono toxin 
5 GUI (22 residues) (HID090) , and Conus King Kong micro- 
protein (27 residues) (WOOD90) . Structural information can 
be obtained from X-ray or neutron diffraction studies, NMR, 
chemical cross linking or labeling, modeling from known 
structures of related proteins, or from theoretical 
10 calculations..' 3D structural information obtained by X-ray 
diffraction, neutron diffraction or NMR is preferred because 
these methods allow localization of almost all of the atoms 
to. within defined .limits. Table 50 lists several preferred 
IPBDS . . . 

15 Mutations may ; reduce the stability of the PBD. Hence 

the chosen IPBD should preferably have a high melting 
temperature, e.g., at least 50»C, and preferably be stable 
over a wide pH range .e.g., 8.0 to 3.0, but more preferably 
il.O to 2.0, so that the SBDs derived from the chosen IPBD 

20 by mutation and selection- through -binding will retain 
sufficient stability. Preferably, the substitutions in the 
IPBD yielding the various PBDs do not reduce the melting 
point of the domain below -40*C. It Will be appreciated 
that mini -proteins, contain covalent crosslinks , such as one 

25 or more disulfides, are therefore are likely to be 
sufficiently stable. 

In vitro, disulfide bridges can form spontaneously in 
polypeptides as a result of air oxidation. Matters are more 
complicated in vivo . Very few intracellular proteins have 

30 • disulfide bridges-, probably, because a strong reducing 
environment is maintained by the glutathione system: 
Disulfide bridges are . common in proteins that travel or 
operate in intracellular spaces, such as snake venoms and 
other toxins ( e.g. . : conotoxins, charybdotoxin', bacterial 
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enterotoxins) , peptide hormones, digestive enzymes, 
complement proteins, immunoglobulins , lysozymes , protease 
inhibitors . (BPTI and . its homologies , CMTT - III ( Cucurbita 
maxima trypsin inhibitor III) and its homologues , hirudin, 
5 etc. ) and milk proteins. 

. Disulfide bonds that. close tight intrachain loops have 
been found in pepsin, thioredoxin, insulin A- chain, silk 
: ; f ibroin ; and lipoamide dehydrogenase. The bridged cysteine - 
residues are separated by one to four residues along the 
10 polypeptide chain. Model building, X-ray diffraction 
analysis, and NMR studies have shown that the a carbon path 
v of such loops is usually fiat and rigid. 
V • There are two types of disulfide bridges in. immuno- 

globulins . One is the conserved intrachain bridge , spanning 
15 about 60. to 70 amino acid residues arid found, repeatedly, in 
almost every immunoglobulin domain. Buried deep between the 
opposing 0 sheets, these bridges are. shielded from solvent 
and. ordinarily can be reduced only in the presence of 
denaturing agents . The remaining disulfide bridges are 
20 mainly interchain bonds and are located on the surface of 
the molecule; they are accessible to solvent and relatively 
"easily reduced (STEI85) . The disulfide bridges of the 
micro-proteins of the present invention are intrachain, 
linkages between -cysteines having much smaller chain 

25. • spacings . ^ 

When a micro-protein contains a plurality of disulfide 
bonds, it is preferable that ' at least two. cysteines be 
clustered, i.e. , are immediately adjacent along the chain (- 
C-C-) or are separated by a single amino acid (-C-X-C- ) . In 

30 either case, .the two clustered cysteines become unable to 
pair with each other for steric reasons, and the number of 
realizable topologies is reduced. 

An intrachain disulfide bridge connecting amino acids 
3 . and 8 of a 16 residue polypeptide will be said- herein to 

35 have a span of 4. If amino acids 4 arid 12 are also 
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disulfide bonded, then they form a second span of 7. 
.Together, the four cysteines divide the polypeptide into 
four intercysteine segments (1-2, 5-7, 9-11, and 13-16) . 
(Note that there is no segment between Cys3 and.Cys4.) The 
5 connectivity pattern of a crosslinked micro -protein is a 
simple description of the relative location of the .termini 
of the crosslinks. For example, for a micro-protein with 
two disulfide bonds, the connectivity pattern 11-3, 2-4" 
means that the first crosslinked cysteine is disulfide 
10 bonded to the third crosslinked cysteine ( in the primary 
sequence) , and the second to the fourth. 

The degree to which the crosslink constrains the 
conformational freedom of the mini-protein, and. the degree 
to which it stabilizes the mini-protein, -may be. assessed by 
15. a number .of means . These include absorption, spectroscopy 
(which can reveal ■ whether an amino acid is buried or 
exposed), circular dichroism studies (which provides % 
general picture of the helical content of the protein), 
nuclear magnetic resonance imaging (which reyeals the number 
20 of nuclei in a particular chemical environment as well as 
the mobility of nuclei) , and X-ray or neutron diffraction 
analysis of protein crystals. The stability of the mini- 
protein may be ascertained by monitoring the changes in 
absorption at various wavelengths as a function of 
25 temperature, pH, etc. ; buried residues become exposed as the 
protein unfolds. Similarly, the unfolding of the mini - 
protein as a result of denaturing conditions results in 
changes in NMR line positions and widths. Circular 
dichroism (CD) spectra are extremely sensitive to confor- 
30 mation. 

The variegated disulfide -bonded micro -proteins of the 
present invention fall into several classes . 

n aHfl 1 mirrn-proteins are those featuring a single 
. pair of cysteines capable of interacting to form a disulfide 
35 bond, said bond having a span of no more than about nine 
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residues . This disulfide bridge preferably has a span of at 
least two residues; this is a function of the geometry of 
the disulfide bond. When the spacing is two or three resi- 
dues, one residue is preferably glycine in order to reduce 
5 the strain on the bridged residues . The upper limit on 
spacing is; less precise; however, in general , the greater 
the spacing, the less the constraint on conformation imposed 
: on the •linearly* intermediate amino acid residues by the 

disulfide bond . 

10 The main chain of such a peptide has very little 

freedom, but is not. stressed. The free energy released when 
the disulfide forms exceeds the free energy lost by the 
main- chain when locked into a conformation that ; brings the 
cysteines together. Having lost . the .free energy of 
15 disulfide formation, the proximal ends of the side. groups 
are held in more or less fixed relation to each other. When 
binding to a target, the domain does not need to expend free 
- energy getting into the correct conformation. The domain, 
can hot jump into some other conformation and bind a non- 
20 target. . , „ 

A disulfide bridge with a span of 4 or 5 is especially 
preferred. If the span is increased to 6, the constraining 
influence is reduced. In this case, we prefer that at least 
' one of the enclosed residues be an amino acid that imposes 
25 - restrictions on the main- chain geometry. Proline imposes 
the most restriction. Valine and isoleucine restrict the 
main chain to a lesser extent. .The preferred position for 
this constraining non- cysteine residue is adjacent to one of 
the invariant cysteines, however, it may be one of the other 
30 bridged residues. If the span is seven, we prefer to 
include two amino acids that limit main- chain conformation . 
These amino acids could be at any of the seven positions, 
but are preferably the two bridged residues that are 
immediately adjacent to the cysteines. If the span is eight 
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or nine, additional constraining amino acids may be 

provided . ^ . 

While a class I micro-protein may have up to 40 amino 
acids, more preferably it is ho more than 20 amino acids. 
5 The disulfide bond of a class I micro-proteins is 

exposed to solvent . Thus, one usually should avoid exposing 
the variegated population of GPs that display class I micro - 
proteins to reagents that rupture disulfides V 

maas II m- ifvrn-nroteins are those featuring a single 
10 disulfide bond having a span of greater than nine - amino 
acids.. The bridged amino acids form secondary structures 
which help to stabilize their conformation. Preferably, 
these intermediate amino acids form hairpin supersecondary 
structures such as those schematized below: 
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, Based on studies of known proteins, one may calculate 
the propensity of a particular residue, or of a particular 
25 dipeptide or tripeptide, to be found in an a helix, 0' strand 
or reverse turn. The normalized frequencies of occurrence 
... of the amino acid residues in these secondary structures is 
given in Table 6-4 of CREI84. For a more detailed treatment 
• on the prediction of secondary structure from the amino acid 
30 sequence, see Chapter 6 of SCHU79 . 

In designing a suitable hairpin structure, one may copy 
an actual structure from a protein whose three-dimensional 
conformation is known, design the structure using frequency 
data,-' or . combine the two approaches : Preferably-, one or 
35 more actual . structures are used as a model, and the 
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frequency data is used to determine which imitations can be 
made without disrupting the structure. , 

: Preferably, no more than three amino acids lie between 
the cysteine and the beginning or end of the oc helix or 0 
.strand.' :. 

More- complex structures (such as a double hairpin) are 

also PJ°sslble._ . •• _ ■ •"' ' .'.*"" 

' - :: HT^ r ^tHose featuring two 
disulfide bonds. They optionally may. also feature secondary 
structures such as ' those discussed above with regard to 
Class II micro-proteins. With two disulfide bonds, there 
..are three possible topologies; if desired, the number of 
realizable, disulfide bonding topologies may be reduced by 
"clustering cysteines as in heat-stable enterotoxin,ST-Ia. 
15 / rr*»* Tilb ^^-pr 0te in S are those featuring three or 

more disulfide bonds and preferably at least one cluster of 

cysteines as previously' described. .. 

. M<at . a1 ^rigp . ^^.Prn^ing. The present- invention also 
relates to" mini -proteins which hot crosslinked, by 

disulfide bonds, e.g., analogues of finger proteins. Finger 
proteins are characterized by finger structures in which a 
metal ion is coordinated by two Cys and two His residues, 
forming a tetrahedral arrangement around it. The metal ion 
is most often zinc (II), but may be iron, copper, cobalt, 
et . c . The "finger" has the consensus sequence? (Phe ,br Tyr) - 
(1 AA) -Cys- (2-4 AAs) -Cys- (3 AAs) - Phe- (5 AAs,) -Leu- (2 AAs) - 
His- (3 AAS) -His- (5 AAS) (BERGS 8; GIBS88 ) . ■; While ..finger, 
proteins typically contain many repeats of the finger motif , 
it is known that a single finger .will fold in the presence 
of zinc ions (FRAN87; PARR88) . There is some dispute as to 
whether two fingers are necessary- for binding to DNA, The 
present invention encompasses mini-proteins with, either one 
or two fingers. Other combinations of side groups -can lead 
to formation of crosslinks involving multivalent metal ions. 
Summers (SUMM91) , for example, reports an 18 -amino -acid mini 
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protein .found in. the caps id protein of HIV-l-Fl and having 
three cysteines and one hi'stidine that bind a zinc atom. It 
is . to be understood that the target need not be a nucleic 

.acid. 
5 G. Modified PBSs 

.-. There exist a number of enzymes and chemical reagents 
that can selectively modify certain side groups of proteins, 
including: a) protein- tyrosine kinase/ Ellmans reagent, 
methyl transferases (that methylate GLU side groups) , serine 

10 kinases, proline hydroxyases, vitamin-K dependent enzymes 
that convert GLU . to GLA, maleic anhydride, and alkylating, 
agents. Treatment of the variegated population of GP (PBD) s 
with one of these enzymes or reagents will modify the side 
•groups affected by the chosen enzyme or reagent. Enzymes 

15 and reagents that do not kill the GP are much preferred. 
Such modification of side groups can. directly affect the 
binding properties of the displayed PBDs. Using affinity 
separation methods, we enrich for the modified GPs that bind 

; the predetermined target. Since the active binding domain 
20 is not entirely genetically specified, we must repeat the 
post-morphogenesis modification at each enrichment round. 
This approach is particularly appropriate with mini -protein 
IPBDs because we envision chemical synthesis Of these SBDs. 

25 III. VARIEGATION STRATEGY MUTAGENESIS TO OBTAIN POTENTIAL 

BINDING DOMAINS WITH DESIRED DIVERSITY 

Generally 

When the number of different amino acid sequences 
obtainable by mutation of the domain is large when compared 

30 to the number of. different domains which are displayable in 
detectable amounts, the efficiency of the forced evolution 
is greatly enhanced by careful choice of which residues are 
to be varied. . First, residues of a known protein which are 
likely to affect, its binding activity ie^L, surface 

35 residues) and not likely to unduly degrade its stability are 
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identified. Then all or some of the- codons encoding these 
residues are varied simultaneously to produce a variegated 
population of DNA. Groups of surface residues that are. 
close enough. together on the surface to touch one molecule 
of .target simultaneously are preferred sets for .simultaneous 
variegation. , The variegated population of DNA is used to 
express a variety of potential binding domains; . whose 
ability, to bind the "target of ' interest may -then be 

evaluated'. _ •' . . 

The method of the present invention is .thus further 

• distinguished from other methods in the nature of the highly 

• variegated, population that is produced and from which novel 
% . " ' . binding proteins are selected. We force - the displayed 

potential binding domain to sample the nearby "sequence 
15 space" of related amino -acid sequences in an efficient, 
. organized manner. Four goals guide the various variegation 
plans used herein, preferably: 1) a very large number (e^- 
lp 7 ) of variants is available, 2) a very high percentage of 
the possible variants actually appears in— detectable- 
amounts, .3) the frequency of appearance of the desired 
variants is relatively uniform, and 4) variation occurs only 
- at a limited number of amino -acid residues , most preferably 
at residues having side groups directed toward a common 
region on the surface of the potential binding domain. 
25 This is to be distinguished from the simple use of 

indiscriminate mutagenic agents such as radiation and 
hydroxylamine to modify a gene , where there is no (or very 
oblique) control over the site of mutation. Many of the 
mutations will affect residues that are not a part of the 
30 binding domain. When chemical mutagens, are directed toward 
. the whole genome, most mutations occur in genes other than 
the one encoding the potential binding domain.-. Moreover, 
\ since at a reasonable level of mutagenesis, any modified 
codon is likely to be characterized by a single base change , 
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only a limited and biased range of possibilities, will be 
explored. Equally remote is the use of site-specific 
mutagenesis techniques employing mutagenic oligonucleotides, 
of nonrandomized sequence, since these techniques do hot 
lend themselves to the production and testing of a large 
number of variants. While focused random mutagenesis 
techniques are known, the importance of controlling the 
distribution of variation has been largely overlooked. 

The potential binding domains are first designed at the 
amino acid level. Once we have identified which residues 
are to be mutagenized, and which mutations to allow at those 
positions, we may then design the variegated DNA which is to 
encode the various PBDs so as . to assure that there is a 
•reasonable probability that if a PBD has an affinity for the 
target, it . will be detected. Of course, the number of 
independent transf ormants obtained and the sensitivity of 
the affinity separation technology will impose limits on the 
extent of variegation possible within any single round of 
variegation. 

There are many ways to generate diversity in a protein. 
(See RICH86, CARU85, and OLIP86 . ) At one extreme, we vary 
a few residues of the protein as much as possible ( iflter 
alia see CARII85, CARU87, RICH86, and WHAR86) . We will call 
this: approach "Focused Mutagenesis" . A typical "Focused 
Mutagenesis" strategy is to pick a set of five to seven 
residues and vary each through 13-20 possibilities. An 
alternative plan of mutagenesis ("Diffuse Mutagenesis") is 
to vary many more residues through a more limited set of 
choices (See VERS86a and PAKU86) . The variegation pattern 
adopted may fall .between these extremes, <g.g. , two residues 
varied, through all . twenty amino acids , two more through only 
• two possibilities, and a fifth into ten of the twenty amino 

' acids-. ' : 

There is no fixed limit on the number of codohs which 
35 can be mutated simultaneously. However, it is desirable to 
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. adopt a mutagenesis strategy which results in: a reasonable 
probability that a possible PBD . sequence is in fact 
displayed by at least one genetic, package. Preferably, the . 
probability that a mutein encoded by the vgDNA and composed 
5 of the least favored amino acids at each variegated position 
W in be t displayed by- at least one independent transf ormant 
.' in ..the library is at least 0.50, and more preferably at 
• least 0 .90. (Muteins composed of. more favored amino acids 
would of course be more likely to occur in the same 
10 library.) 

. .. Preferably/ the variegation is such as will cause a 
typical transf ormant population to display. 10 6 -10 7 different 
.... - amino acid sequences by means of -preferably not more than 
! - 10-fold >ore (more preferably ' hot more . than- ,3 -fold) 
15 different DNA sequences. ' . " . 

For a Class i micro-protein that lacks a helices and (J 
strands, one will, " in any given round, of. •mutation, 
preferably variegate each of 4-8 non- cysteine codons so that 
-they each encode at least eight of the 20 possible, amino 
20 acids. The variegation at each codon could be customized to 
that position.. Preferably, cysteine is hot one of the 
potential substitutions, though it is not excluded. 
. .' when the mini -protein is a metal finger protein, in a 
. .. typical variegation strategy, the two Cys and two His 
-25 residues, . and optionally also the aforementioned Phe/Tyr, 
Phe and. Leu residues,' are held invariant and .a plurality 
(usually 5-10) of the other residues are varied. 

When the micro-protein is of the type featuring one or. 
. more a helices and $ strands , the set of potential amino 
30 ^ acid modifications at any given position is picked to favor 
those: which are less likely to- disrupt the secondary 
structure. at that position. Since the number of possibil- 
ities at . each variable amino acid is more limited-, the total 
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number of variable amino acids may be greater without 
altering the sampling efficiency of the selection process. 
-For class III micro-proteins, preferably not more than 

20 and more preferably 5-10 'codons will be variegated. 
5 However, if diffuse mutagenesis is employed, the number of 
codons which aire variegated can be higher. 

While variegation normally will involve the substitu- 
tion of one amino acid for another at a designated variable 
codon, it may involve the insertion or deletion of amino 

10 acids as well . 

TTT.B. ^pntifirabion o f Residuftfl to be Varied 

We now consider the principles that guide our choice of 
residues of the IPBD to vary. ... A key concept is that only 
structured proteins exhibit specific binding/ can bind 

15 to a particular chemical entity to the exclusion of most 
others. Thus the residues to be varied are chosen with an 
eye to preserving the underlying IPBD structure. 
Substitutions that prevent the PBD from folding will cause 
GPs carrying those genes to bind indiscriminately so that 
20' they can easily be removed from the population. 
Substitutions of amino acids that are exposed to solvent are 
less likely to affect the 3D structure than are 
substitutions at internal loci. (See PAKU86 , RE ID 8 8a, 
EISE85, SCHU79 , pl69-171 and CREI84, p239-245, 314-315). 
25 ' Internal residues are frequently conserved arid the amino 
acid type cannot be changed to a significantly different 
type without substantial risk that the protein structure 
will be disrupted. Nevertheless, some conservative changes 
of internal residues, such as I to L or F to Y, are 
30 tolerated. Such conservative changes subtly affect the 
placement and. dynamics of adjacent protein residues and such 
"fine tuning" may be useful once an SBD is found. Inser- 
tions and deletions are more . readily tolerated in loops than 
elsewhere. (THOR88) . 
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. Data about the IPBD and the target that are -useful in 
deciding which residues to vary in the variegation cycle 
include: 1) 3D structure, or at least a list of residues on 
the surface of the IPBD, 2) list of sequences homologous to 
5 IPBD ,' and 3 ) model of the target molecule or a stand - in for 

the target. • . : - 

; . TII.C; D^t--^rminincT the Su b stitution Set for Wanly parental 

Residue 

Having picked which residues to vary, we now "decide the 

10 range of amino acids to allow at each variable residue. The 
total level of variegation is the product of the number of 
variants at each varied residue. Each varied residue can 
. have a different scheme of variegation, producing 2 ..to 20 
different possibilities . The set of amino acids which are 

15,: potentially encoded by a given variegated codon are called 
its "substitution set". 

The computer that controls a DNA synthesizer, such as 
the Milligen 7500, can be programmed to synthesize any base 
of an oligo-nt with any distribution of rits by taking some 

20 nt substrates ( e.g. nt phosphoramidites ) from each of two or 
more reservoirs. Alternatively,, nt substrates can be mixed 
in any ratios and placed in one of the extra, reservoir for 
so called "dirty bottle" synthesis. Each codon could be 
programmed differently. The "mix" of bases at each 

25 nucleotide position of the codon determines the relative 
frequency of occurrence of the different amino, acids encoded 
by. that codon. . ; 

Simply variegated codohs are those in which those 

nucleotide positions which are degenerate are obtained from 

30 a mixture of two or more bases mixed in equimolar ; propor-. 
tions. These mixtures are described in this specification 
by means of the standardized "ambiguous nucleotide" code. 
In this code, for example, in the degenerate codon "SNT" , 
"S" denotes an equimolar mixture of bases G and, C, "N" , an 
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equimolar mixture of all four bases, and "T" , the single 
invariant base thymidine. 

Complexly variegated codons are those in which at least 
one of the three positions is filled by. a base from an other 
5 than equimolar mixture of two of more bases . 

Either simply or complexly variegated codons may be 
used to achieve the desired substitution set. 

If we have no information indicating that a particular 
amino acid or class of amino acid is appropriate, we strive 

10 to substitute all amino acids with equal probability because 
representation of one mini -protein .above the detectable 
level is wasteful. Equal amounts of all four nts at each 
position in a codon (NNN) yields the amino acid distribution 
in* which each amino acid is present in proportion to the 

15 'number of codons that code for it. This distribution has 
the disadvantage of giving two basic residues for every 
acidic residue. In addition, six times as much R, S, and L 
as w or M occur. If five codons are synthesized with this 
distribution, each of the 243 sequences encoding some 

20 ' combination of L, R, and S are 7776 -times more abundant than 
each of the 32 sequences encoding some combination of W and 
M. To have five Ws present at detectable levels, we must 
have each of the (L,R,S) sequences present in 7776-fc-ld 
excess '. 

25 Particular amino acid residues can influence the 

tertiary structure of a defined polypeptide in several ways, 
including by: 

a) affecting the flexibility of the polypeptide main 
chain, 

30 b) adding hydrophobic groups, 

c) adding charged groups, 

d) allowing hydrogen bonds, and 

e) forming cross -links, such as disulfides, chelation to 
metal ions, or bonding to prosthetic groups. 
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Lundeen (LUND86) has tabulated the frequencies of amino 
acids in helices, 0 strands, turns, arid coil in proteins- of 
known 3D structure and has distinguished between CYSs having 
free thiol groups and half cystines. He reports that free 
5 CYS is • found most often in helixes while half • cystines , are 
found more often in 0 sheets. Half cystines are, however, 
regularly found in helices . Pease s£ sl^ (PEAS90) 
constructed a peptide having two cystines; one end of each 
is in a very stable a helix, Apamin has a similar structure 
10 (WEMM83, PEAS88) . . ■ " 

"■ Flexibility : 

GLY is the smallest amino acid,, having two hydrogens 
.• attached to the C 0 . Because GLY has ho C,, it confers the 
most flexibility on the main chain. Thus GLY occurs very 
15. frequently in reverse turns , particularly in conjunction 
with PRO, ASP, ASN, SER, arid THR. 

The amino acids ALA, SER, CYS, ASP, ASN, LEU, MET, PHE, 
: TYR, TRP, ARG, HIS, GLU, GLN, and LYS have unbranched 0 
. carbons . Of these , the side groups of SER, ASP , and ASN 
20 frequently make hydrogen bonds to the main chain. and so can 
take on main- chain conformations that are energetically 
unfavorable for the others . VAL, ILE, and THR have branched 
$ carbons which makes the extended main- chain conformation 
< more favorable. Thus VAL and ILE are most often seen in 0 
sheets. • Because the side group of THR can easily form, 
hydrogen bonds to the main chain, it has less tendency to 
exist in a j8 sheet.. , . 

The main chain of proline is particularly constrained 
by the cyclic side group. The </> angle is always close to - 
30 60°. Most prolines are found near the surface, of the 
protein. 

Charge : '' " ; ' 

LYS .and ARG carry a single positive charge at any pH 
below 10.4 or 12.0, respectively. Nevertheless, the 
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methylene groups, four and three respectively, of these 
amino acids are capable of hydrophobic ■ interactions The 
. guanidinium group of ABG is capable of £ i- 
Hydrogens simultaneous^ while the amino group of LYS^can 
5 donate only three. Furthermore, the geometries of these 
groups is quite different, so that these groups- are. often 

- not interchangeable. ' 

ASP and GLU carry a single negative charge at any pH 
above -4.5 and 4.6. respectively. Because ASP has butane 
0 methylene group, few hydrophobic interactiens are poss^ 

The' geometry of ASP lends Itself to forming hydrogen bonds . 
to main-chain nitrogens which is consistent with ASP bemg 
found very often in reverse turns and at the beginnxng of 
- helices. * GLU is more often found in a hel.ee* and 
15- particularly in the .amino- terminal portion -of these .helices 
because the negative charge of the side; ^ ^ 

stabilizing interaction with the helix dxpol. (NICH88, 

SALI88) . - , ■ . 

••. HIS has an ionization pK in the physiologxcal range, 
20 ^ 6.2. This pK can be altered by the proxirni^ of 
charged groups or of hydrogen donators or acceptors. HIS is 
capable of forcing bonds to metal ions such as zinc, copper, 
and iron. 

TYR, and TRP can participate in hydrogen bonds.. 

rrnP ff Thetos t important form of cross link is the disulfide 
bond formed between the thiols of CYS residues. In a 
30 suitably. oxidizing ..environment, these bonds form 
spontaneously, These bonds can greatly -abiliz^a 
particular conformation of a protein or minx-protein. When 

. a mixture of- oxidized and reduced thiol reagents are 
present/ exchange reactions take place that allow the most 
35 stable conformation to predominate. Concerning dxsulf xdes 
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in proteins and peptides, see also KATZ90 , MATS89 > PERR84, 
PERR8 6 , SAUE86, WELL86, JANA89 , HORV89, KISH85, and SCHN86- 
Other cross links that form without need of specific 
enzymes include: 
5 1) (CYS) 4 :Fe Rubredoxin (in CREI84, P-376) 

2) (CYS) 4 :Zn Aspartate' Trahscarbamylase (in 

. CREI84, P. 376) and ' Zn- fingers 
T (HARD90) ' ' \ 

3) (HIS) 2 (MET) (CYS) :Cu Azurin (in CREI84, P.376) and 
10 Basic "Blue" Cu Cucumber protein 

(GUSS88) 

4) (HIS) 4 :Cu / CuZn superoxide dismutase 

5) (CYS) 4 : (Fe 4 S 4 )\ Ferredoxih (in CREI84, P.376) ; 

6) (CYS) 2 (HIS) 2 : Zn Zinc- fingers (GIBS88, SUMM91) 
15 7) (CYS) 3 (HIS) :Zn Zinc-fingers (GAUS87 , GIBS88) 

Cross links having (HIS) 2 (MET) (CYS) :Cu has the potential 
- advantage that HIS and MET can not form other cross links 
without Cu. 

Simply Variegated Codons 

20 The following simply variegated codons are useful 

because they encode a relatively balanced set of amino 
acids : 

1) SNT which encodes the set [L, P,H,R, V, A,D,G] : a) one 

- acidic (D) and one basic (R) , b) both aliphatic (L,V) 

■ 25:/ and aromatic hydrophobics (H) , c) large (L, R,H) and 

small (G, A) side groups, d) rigid (P) and flexible (G) 
amino acids, e) each amino acid encoded once; 

2) RNG which encodes the set [M, T, K, R, V, A, E , G] : a) one 
acidic and two basic (not optimal, but acceptable) , b) 

30 *: : hydrophilics and hydrophobics, c) ■'/ each amino acid 

encoded once. 

3) \ RMG which encodes the set [T,K,A,E] : a) one acidic, one 

- basic, one neutral hydrophilic, b) three favor or 
helices, c) each amino acid encoded once. 
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4) VNT which encodes the set [L, P,H,R, I,T,N, S,V, A,D,G] : 
a) one acidic, one basic, b) all classes: charged, 
neutral hydrophilic, hydrophobic, rigid and flexible, 
etc. . c) each amino acid encoded once. 
5 5) RRS which encodes the set [N,S,K,R,D,E,G»] : a) - two 

acidics, two basics, b) two neutral hydrophilics, c) 
only glycine encoded twice. 

6) NNT which encodes the set [F,S,Y,C,L,P,H,R, I,T,N,V,A- 
- ,D,G] : a) sixteen DNA sequences provide fifteen dif- 

10 . - ferent amino acids; only serine is repeated, all others 

are present in , equal amounts (This allows very 
efficient sampling of . the library.), b) there are equal 
:., numbers of acidic and basic amino acids (D and R, once 
• • each) . c) all major classes of amino acids are present : 
acidic, basic, aliphatic hydrophobic, ' aromatic 
hydrophobic, and neutral hydrophilic. 

7) NNG, which encodes the set [L* ,R a , S, W, P, Q,M,T,K,V, A, - 
E,G, stop]: a) fair preponderance of residues that 
favor formation of a-helices [L,M, A, Q, K, E; and, to a 
lesser extent, S,R,T] ; b) encodes 13 different amino 
acids. .. ( VHG encodes a subset of the set encoded by NNG 
which encodes 9 amino acids in nine different DNA 
sequences, with equal acids and bases, and 5/9 being a 
helix- favoring. ) 

25 For the initial variegation, NNT is preferred, in most 

cases. However, when the codon is encoding an amino acid to 
be incorporated into an a helix, NNG is preferred. 

Below, we analyze several simple variegations as to the 
efficiency with which the libraries can be sampled. 

30 Libraries of random hexapeptides encoded by (NNK) 6 have 

been reported (SCOT90, CWIR90) . Table 130 shows the 
expected behavior of such libraries. NNK produces single 
codons for PHE, TYR, CYS, TRP, HIS, GLN, ILE, MET, ASN, LYS , 
ASP, and GLU (a set); two codons for each of VAL, ALA, PRO, 
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THR; and GLY (• set) ; and three codons for each of IiEU, ARG, 
and SER (Q set) . We have separated the 64, 000,000 possible 
sequences into 28 classes, shown in Table 13 OA, based on the 
number of amino acids from each of these sets. The largest 
5 class is #Qaaaa with -14. 6% of the. possible sequences. 
Aside from any selection, all the sequences in one class 
have the same probability of. being produced. ^ Table 13 0B 
shows the probability that a : given DNA sequence taken from 
. the (NNK) 6 library will encode a hexapeptide belonging to 

10 one of the defined classes; note that 'only -6.3% of DNA 
sequences belong to the *Qaaaa class. 

Table 130 C shows the expected numbers of . sequences in 
each class for libraries containing/ various numbers ..of 
• independent trans formants ( viz.- 10*, 3-10 6 , 1Q 7 ., 3.-10 7 , 10 8 , 

X5 3-10 8 , 10 9 , and 3-10 9 ) . At 10 6 . independent transf ormants 
(ITs), we expect to see 56% of the OOQDQO class, but only 
0.i% of the aaaaaa class. The vast majority of sequences 
seen come from classes for which less than 10%. of the class 
is sampled. Suppose a peptide from, for example, class 

20 **QQaa is isolated by fractionating the library for binding 
to a target. Consider how much we know about peptides that 
are related to the isolated sequence . Because only 4% of 
the **QQaor class was sampled, we can not conclude that the 
amino acids from the O set are in fact the best from the Q 

25 set . We might have LEU at position 2 , but ARG or SER could 
be better. Even if we isolate a peptide of : the QQQQQQ 
class, there is a noticeable chance that better, members of 
the class were not present in the library . . * . .. 

With a library of 10 7 ITs , we : see that several" classes 

30 have been completely sampled, but that; the aaaaaa class is 
only. 1.1% sampled. At 7.6-10 7 ITs, we expect display, of 50% 
of all amino -acid sequences, but the classes containing 
three or more amino acids of the a set are still poorly 
sampled. To achieve complete sampling of the (NNK) 6 library 
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requires about 3-loMTs, 10 -fold larger than the largest 
- (NNK) 6 library so far reported. 

Table 131. shows expectations for a library encoded by 
(NNT) 4 (NNG) 2 . The. expectations of abundance are independent- 
of the order of the codons or of interspersed unvaried 
cddons . This library encodes 0 . 133 times as many amino -acid 
sequences, but there are only 0.0165 times as many DNA 
sequences. Thus 5.0-10 7 ITs (i^JLu 60-fold fewer than 
required for (NNK) 6 ) gives almost complete sampling of the 
library. The results would be slightly better for "(NNT) 6 
and slightly, but hot much, worse for (NNG) 6 . The 
controlling factor is the ratio of DNA sequences to amino- 

acid sequences. 

Table. -132 shows the ratio of #DNA sequences /#AA 
sequences for codons NNK, NNT, and NNG. For NNK and NNG, we 
have .assumed that the PBD is displayed as part "of an 
essential gene, such as gene ffl in Ff phage, as is 
indicated by the phrase -assuming stops vanish". It is not 
in any way required that such an essential gene be used. If 
a non-essential gene is used, the analysis would be slightly 
different; sampling of NNK and NNG would be slightly less 
efficient. Note that (NNT) 6 gives 3.6-fold more amino- acid 
sequences than (NNK) 5 but requires 1.7-fold fjswejr. DNA 
sequences . Note also that (NNT) 7 gives £wics as many amino - 
25" acid sequences as (NNK) 6 , but 3. 3 -fold fewer DNA sequences. 

• Thus, while it is possible to use a simple mixture 
(NNS, NNK or NNN) to obtain at a particular position all 
twenty amino acids, these simple mixtures lead to a highly 
biased set of encoded amino acids. This problem can be 
overcome by use of complexly variegated codons . 
Complexly Variegated Codons 

The nt distribution ("fxS") within the codori that 
allows all twenty amino acids and that yields the largest 
ratio of abundance of the least favored amino acid (ifaa) to 
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that of the most favored amino acid (mfaa) ; 'subject to. the 
•constraints of equal abundances of acidic and basic amino 
acids, least, possible number of stop- codons, and, for 
convenience, the third base being T or G, is shown in Table 
5 10A and yields DNA molecules encoding each type . of amino 
acid with the abundances' shown. Other complexly variegated " 
codons are obtainable by relaxing one or more constraints. 

„ ^ote; that this' chemi^t-^ amino r 

acids, with acidic and .basic amino acids being equiprobable , 
10 and the most favored amino acid (serine) is encoded only 
2.454 times as often as the least favored amino acid (tryp- 
tophan) . The "fxS" vg codon improves sampling most for 
peptides containing several of the amino acids [F,Y,C,W,H- 
'• /q, i,i4,N,K;DiE] for which NNK or NNS provide only one codon. 
15 its . sampling advantages are most pronounced when the. library 

' .'■ is relatively small.'' '\ ' : •'.'■ ■'; ■' ]""■:■ v ■ 

The results of omitting the requirements of equality of 
. acids and bases and minimizing stop codons are shown in 

Table 10B. '■ ;. ','-.'. 

20 - The advantages of an NNT codon are discussed elsewhere 

. in the present application. Unoptimized NUT provides 15 
amino acids encoded by only 16 DNA sequences . It is 
possible to improve on NUT with the distribution shown in 
Table 1QC, which gives five amino acids (SER, LEU, HIS, VAL, 

25 ASP) in- very nearly equal amounts .- A further eight amino 

" . '. ■ acids (PHE, TYR, ILE,. ASN, PRO, ALA,. ARG, GLY) are present, 
at 78% the abundance of SER. THR and CYS remain at half the 
abundance of SER. When variegating DNA for . disulfide -bonded 
micro-proteins, it is often desirable to reduce the 

3 o prevalence of CYS . This distribution allows 13; amino acids 
to be seen at high level and gives no stops; the optimized 
fxS distribution aliows only 11 amino acids at high 

.prevalence.''; - .■■•'." \ 

The NNG codon can also be optimized. : Table 10D shows 
35 an approximately optimized ( [ALA] - [ARG] ) NNG codon. There 
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are, under this variegation, four equally most favored amino 
acids: LEU, ARG, ALA, • and GLU. Note that there is one 
acidic and one basic amino acid in this set: There are two 
equally least favored amino acids: TRP and MET. - The ratio 
of lfaa/mfaa is 0.5258. If this codon is repeated* six 
times/ peptides composed entirely of TRP and MET are 2% as 
common as" peptides composed entirely of the most favored 
amino acids. We refer to . this as "the prevalence of 
(TRP/MET) 6 in optimized NNG 6 vgDNA" . ...... 

When synthesizing vgDNA by the "dirty bottle" method, 
it is sometimes desirable to use only a limited number of 
mixes. One very useful mixture is called the "optimized NNS 
mixture" in which we average the first two positions of the 
fxS mixture: T, - 0.24, C, = 0.17, A t - 0 .33 , G t - " 0 .26 , the 
second position is .identical to the first, C 3 - G 3 - 0.5. 
This distribution provides the amino acids ARG, SER, LEU, 
GLY, VAL, THR. ASN. and LYS at greater than 5% plus ALA, 
ASP, GLU, ILE, MET, and TYR at greater than 4%. 

An additional complexly variegated codon is- of 
interest . This codon is identical to the optimized NNT 
codon at the first two positions and has T:G::90:10 at the 
third position. This codon provides thirteen amino acids 
(ALA, ILE, ARG. SER, ASP, LEU,. VAL, PHE, ASN, GLY, PRO, TYR, 
and HIS) at more than 5.5%., THR at 4.3% and CYS at 3.9% are 
more common than the LFAAs- of NNK (3.125%)'. The remaining 
five amino acids are present at less than 1%. This codon 
has the feature that all amino acids are present; sequences 
having more than two of the low-abundance amino acids are 
rare" When we isolate an SBD using this codon, we can be 
reasonably sure that the first 13 amino acids were tested at 
each position. A similar codon, based on optimized NNG,- 

could' be used.. 

Table 10E shows some properties of an unoptimized NNS 
(or NNK) codon; Note that there are three equally most- 



WO 92/15677 



PCT/US92/01456 



40 

, favored amino acids: ARG, LEU, and SER. There are also 
twelve equally least favored amino acids: PHE, ILE, MET, 
- TYR, HIS, GLN, ASN, LYS, ASP, GLU, CYS, and TRP. Five amino 
acids (PRO, THR, ALA, VAL, GLY) fall in between. Note that 
5 a six- fold repetition of NNS gives sequences composed of the 
amino acids [PHE, ILE, MET, TYR,' HIS , GLN, ASN, LYS,. ASP, 
GLU, CYS, and TRP] at only -0.1% of the sequences composed 
of [ARG, LEU, and SER] . Not only is this -2 0 - fold lower 
than the prevalence of (TRP/MET) 6 in optimized NNG 6 vgDNA, 

10 but this low prevalence applies to twelve amino acids. ,. 
Diffuse Mutagenesis 

Diffuse Mutagenesis can be applied to any part of the 
protein at any time," but is most appropriate when , some 
binding to the target has been established. Diffuse 

15 Mutagenesis can be accomplished by spiking each of the pure 
hts activated for DNA synthesis ( e.g. nt-phosphoramidites) 
with a small amount of one or more of the other activated 
hts. Preferably, the level of spiking is set so that only 
a small percentage (1% to .00001%, for example) of the final 

20 . product will contain the initial DNA sequence . This will 
insure that many single, double, triple, and higher 
mutations occur, but that recovery of the basic sequence 
will be a possible outcome. ' 

ttt.d. ' special Considera tions Relating to Variegation of 

25. Micro -Proteins with Ess ential Cysteines ■ 

Several of the preferred simple or complex. variegated 
codons encode "a set of amino acids which includes cysteine . 
This means that some of the- encoded binding, domains will 
feature one or more cysteines in addition to the invariant 

30 disulfide-bonded cysteines. For example, at. each NNT- 
encoded position, there is : a one in sixteen chance of 
obtaining cysteine . If six codons are so varied, the 
fraction of domains containing additional cysteines is 0.33. 
Odd numbers . of cysteines can lead to complications, see 
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Perry and Wetzel (PBRR84) . On the other hand, many 
disulfide- containing proteins contain cysteines that do not 
form disulfides, .e^ trypsin. The possibility of unpaired 
cysteines can be dealt with in several ways: 
5 . First, the variegated phage population can be passed 

over an immobilized reagent that strongly binds free thiols, 
such as SulfoLink .(catalogue number 44895 H from Pierce 
Chemical Company, Rockford, Illinois, 61105). Another 
product from Pierce is TNB- Thiol Agarose (Catalogue Code 
10 20409 H). BioRad sells Affi-Gel 401 (catalogue 153-4599) 

for this purpose. 

Second, one. can use a variegation that excludes 

cysteines, such as: 

NHT that, gives [F,S,Y,L,P,H,I,T,N,V,A,D] , 

15 • • . - VNS that .gives . 

..[L^P^H,Q,R 3 ( I,M ) T^N ( K,S,V^A^E ( D ( G']',' J , 
. NNG. that gives [L* , S,W, P, Q, r* ,M,T,K,R,V,A,E,G, stop] , 
SNT that gives [L,P,H,R,V, A,D, G] , 
RNG that gives [M,T, K,R, V, A,E,G] , 
20 - RMS that gives [T,K,A,E] , 

VNT that gives. [L, P,H,R, I, T,N,S,V, A, D,G] , or 
RRS that gives [N,S,K,R,D,E,G a ] . 
However, each of these schemes has one or more of the 
disadvantages, relative to NNT: a) fewer amino acids are 
25 allowed, b) amino acids are not evenly provided, c) "acidic 
and basic amino acids are not equally likely) , or d) stop 
" codons occur. Nonetheless, NNG, NHT, and VNT are almost as 
useful as NNT. NNG encodes 13 different amino acids and one 
stop signal. Only, two amino acids appear twice in the 16- 
30 fold mix. 

■■ ■• - Thirdly, one can enrich, the population for binding to 
the preselected target , and evaluate selected sequences p_ost 
hoc for extra cysteines. Those that contain more cysteines 
■ than' the cysteines provided for conformational constraint 
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may be perfectly usable. It is possible that a- disulfide 
linkage other than the designed one will occur. This does 
not mean that the binding domain defined-by the isolated DNA 
sequence is in any way unsuitable. The suitability of the 
5 isolated domains is best determined by chemical- and 
biochemical evaluation of chemically synthesized peptides. 

"- Lastly, one. can block free thiols with reagents, such 
"as ~Elim^ methyl iodide ; that 

specifically bind free thiols and that do not react with 

10 disulfides , and then leave the modified phage in the 
population. It is to be understood that the. blocking agent 
may alter the binding properties of the micro-protein; thus, 
one might use a variety of blocking reagent, in expectation 
that different binding domains will be- found. .... The 

15 variegated population of thiol -blocked genetic packages are 
fractionated for binding. If the DNA sequence of the 
isolated binding micro-protein contains an. odd number of 
cysteines, then synthetic, means are used to prepare micro- 
proteins having each possible iinkage and in which the odd 

20 thiol is appropriately blocked. Nishiuchi (NISH82, NISH86, 
and works cited therein) disclose methods of synthesizing 
peptides that contain a plurality of cysteines so that each 
thiol is protected with a different type of blocking group. 
These groups can be selectively- removed so that the 

25 disulfide pairing can be controlled. We envision using such 
a scheme with the alteration that one thiol, either remains 
blocked, or is unblocked and then reblocked with a different 

reagent. •• \ ' ' - 

• ttt.E." Planning the S e cond and Later Rounds of Variegation 

30 The method of the present invention allows efficient 

accumulation of information concerning the amino -acid 
sequence of a binding domain having high affinity , for a 
predetermined target . Although one may obtain a highly 
useful binding domain from a single round of variegation and 

35 affinity enrichment, we expect that multiple rounds will be 
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heeded to achieve the highest possible affinity and 
specificity. 

If the first round of variegation results in some 
binding to the -target, but the affinity for the target is 
still too -low, further improvement may be achieved by 
variegation of- the SBDs. . Preferably, the process is 
progressive, i.e. each variegation cycle produces a better 
starting point for the next . variegation cycle than the 
previous cycle produced. Setting the level of variegation 
such that the ppbd and many sequences related to the ppM 
sequence are present in detectable amounts ensures that the 
process is progressive. 

If the level of variegation is so high that the ppbd 
sequence- is present at such- low levels that there ifl' an 
appreciable chance that no transformant will display the 
PPBD, then the best SBD of the next round cjaiild be UOXSS. 
than the PPBD . At excessively high level of variegation, 
each round of mutagenesis is independent of previous rounds 

and there is no assurance of progressivity . This approach 
0 can" lead to valuable binding proteins, but repetition of 
experiments with this level of . variegation will not yield 
progressive results. Excessive variation is not preferred. 

Progressivity is not an all-or-nothing property. So 
long as most of the information obtained from previous 
!5 variegation cycles is retained and many different surfaces 
* that are related to the PPBD surface are produced, the 
process is progressive. 

If the level of variegation in the previous variegation 
cycle was- correctly chosen, then the amino acids selected to 
0 be in the residues just varied are the ones best determined . 
The environment of other residues has changed, so that it is 
appropriate to vary them again. Because there are often 
more residues of interest than can be varied simultaneously, 
we may continue by picking residues that either have never 
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been varied (highest priority) or that have not been varied 
for one or more cycles. 

trse of NNT or NNG variegated codons leads to very effi- 
cient sampling of variegated libraries because the ratio of 
5 (different aminos acid sequences ) /(different ,DNA sequences) 
\ is f much closer to unity than it is for NNK or even ' the 
■ optimized vgv cqdori (fxS) . Nevertheless /, a • few; amino; acids 
are omitted in each case. Both NNT and NNG allow members^ of 
all ; viinpQrtant . classes V pf /amino ^ acids : hydrophobic, 
10 . hydrophilic, acidic/ basic ^.neutral small/and 
large. ; After selecting a binding domain, a subsequent 
variegation and selection may be desirable to achieve a 
; higher affinity or specificity. : During this second 
*: variegktion, amino acid possibilities overlooked : by , the 
15 preceding variegation may be investigated. : 

: A few examples may be helpful . Suppose we obtained PRO 
.. -using. NNT. This amino acid is available with either NNT or 
NNG. We can be reasonably sure that PRO is the best; amino 
: acid from the set [PRO, LEU, VAL, THR, ALA, ARG, GLY, PHE, 
20 . TYR, CYS, HIS, ILE, ASN, ASP, SER] . ^ We next might try a set 
that includes [PRO, TRP, • GLN, MET, LYS, GLU] The set 
allowed by NNG is the pref erred set . ; ^ 

What if we obtained HIS instead? Histidine is aromatic 
/, .and fairly hydrophobic and can form hydrogen bonds to and 
25 . from the imidazole ring. * Tryptophan is hydrophobic and 
aromatic and can donate a , hydrogen to a suitable acceptor, 
and was excluded by the NNT codon. Methionine was also 
:;exclu£^d r an^ 

to. use the variegated codon HDS that allows [HIS,. GLN, ASN, 
30 : LYS, TYR, CYS, TRP, ARG, SER, GLY; <Stop>] • ,7 , ; 

If the first round of variegation is ^entirely 
■ .unsuccessful, a different pattern of variegation, should be 
/ used. ; For example, if more than one interaction set can be 
defined within a domain,; the residues varied in the next 
35 round of variegation should be from a dif f erent : set than 
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that probed -in the initial variegation. If repeated 
failures are encountered, one may switch t6 a different 



IPBD, 



IV. DISPLAY STRATEGY i DISPLAYING FOREIGN BINDING DOMAINS ON 
THE SURFACE OF A "GENETIC PACKAGE" 

' TV: A. General Req uirements for Genetic Packages 

In order to obtain the display of a multitude of 
different though related potential binding domains, appli- 
cants generate a heterogeneous population of replicable 
genetic packages each . of . which comprises a hybrid gene 
including a first DNA sequence which encodes a potential 
binding domain for the target of interest and a second DNA 
• sequence which encodes a . display means, such as an outer 
5 surface protein . native, to the genetic package but not 
natively associated with the potential binding domain (or 
the parental binding domain to which it is related) which 
causes the genetic package to display the corresponding 
chimeric protein (or a processed form thereof ) on its outer 
3 surface. 

• The component of a population that exhibits the desired 
binding properties may be quite small, for example, one in 
10 6 or less. Once this component of the population is 
separated from the . non- binding components, it must be 
5 possible to amplify it. Culturing viable cells is the most 
powerful, amplification of genetic material known and is 
preferred. Genetic messages can also be amplified jLn, yitro , 
e.g. bv PCR, but this is not. the most preferred method. 

-Preferably, the GP can be: l) genetically altered with 
0 reasonable f acility to encode a potential binding domain, 2) 
maintained and amplified in culture, 3) manipulated to 
- display . the potential binding protein domain where it can 
interact . . with - the. target material during affinity 
separation, and 4) affinity separated while retaining the 
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- : geneticinf ormation encoding the displayed binding, domain in 
recoverable form. Preferably, the GP remains viable after 
affinity separation., Preferred GPs are vegetative bacterial 
cells, bacterial spores and, especially, bacterial DNA 
5 viruses - Eukaryotic cells and eukaryotic viruses may ,be 
. . used as \gehetic packages, but are. not preferred/ 
, When the genetic package is a bacterial cell, j or a 

phage which is assembled periplasmically, ; the display means 
has , tvro components . The first component is a secretion 
10 signal which directs the initial' expression product to the 
inner membrane of the cell (a host cell when the package is 
a phage) . This secretion signal': is cleaved off by a signal 
: peptidase/ to yield a processed, mature, potential binding 
protein. Tlie second component is an outer surf ace transport 
-15 signal which directs the package to assemble the processed 
protein into its outer surface. Preferably, this outer 
surface transport signal is derived from a surface protein 
native to the genetic package. 

For example , in a preferred embodiment , the hybrid gene 
20 comprises a DNA encoding a \ potential binding domain operably 
\ linked to a signal sequence ( e . g . . the signal sequences of 
the bacterial phoA or bla genes or the signal sequence of 
. M13 phage crenelll ) arid to DNA encoding a coat protein 
(e.g. . the M13 gene. III. * or gene VIII proteins) . of a 
25 ,-, ; filamentous phage ( e.g. . M13 ) . The expression product is 
transported to the inner membrane (lipid .bilayer) of the 
host cell, whereupon the signal peptide is cleaved off to 
, leave a; processed hybrid protein. The C- terminus, of the 

coat protein-like component of this hybrid protein is 
30 trapped in the lipid bilayer, so that the hybrid protein 
does not escape into the periplasmic space. (This is 
typical of the wild- type coat "protein. ) As the single - 
. stranded DNA of the nascent phage particle passes into the 
periplasmic space, it collects both wild- type coat protein 
35 and the hybrid protein from the lipid bilayer. The hybrid 
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protein is thus packaged into the surface sheath of the 
filamentous phage, leaving the potential binding domain 
exposed on its outer surface. (Thus, the filamentous phage, 
not the host bacterial cell, is the "replicable genetic 
package" in this embodiment.) , 

If a secretion signal is necessary for the display of 
the • potential binding domain, in an especially preferred 
embodiment the bacterial cell in which; the hybrid gene is 
expressed is of a "secretion-permissive" strain. 

When the genetic package is a bacterial spore , or a 
-phage (such as *X174 or X) whose coat is assembled 
intracellular^, a secretion. signal directing the expression 
product to the inner membrane of the host bacterial, cell is 
" .unnecessary. In these. cases, the display means is merely 
15 the outer surface, transport signal, typically a derivative 
of a spore or phage coat protein. 

Preferred OSPs for several GPs are given in Table 2. 
References to QSHzlim fusions in this section should be 
: taken to apply, P 11 " 81 " 1 a mutandis, to aan^m and aa&JteZ 
20 fusions as well. • . 

tv.bV Ph? qgg for Use as GPs; 

Periplasmically assembled phage are preferred wheni the 

IPBD is a disulfide -bonded micro -protein, as such IPBDs . may 
* not fold within a cell (these proteins may fold after the 
phage is released from the cell) . Intracellularly assembled 
phage are preferred. when the IPBD needs large or insoluble 
prosthetic groups (such as Fe 4 S 4 clusters) , since the IPBD 
may not fold if secreted because the prosthetic group is 
• lacking- in the periplasm. 

When variegation is- introduced, multiple infections 
could generate hybrid GPs that carry the gene for one PBD 
but have at least some copies of a different PBD on their 
surfaces; it is preferable to minimize this possibility by 
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infecting cells with phage iinder conditions resulting- in a 
low multiple- of- infect ion (MOI) . 

v .... Bacteriophages are excellent candidates for GPs because 
there is little or no enzymatic activity associated with 
5 intact mature phage ,. and because the genes are inactive 
outside, a bacterial host rendering the mature phage 
; : particles ^etaboiically ih^ • ; ^ • " ; v - 

v For a given bacteriophage , the preferred OSP is usually 
one that is present, on the phage : surf ace in the largest 
10 . . number of copies • Nevertheless, an OSP such as M13 gill 
; protein (5 copies/phage) may be an excellent choice, as OSP 
to cause display of the PBDV . - ^ 

; : ^ >i / It, is preferred that the wild-type osp v. gene be 
preserved. . The ipM. gene fragment may be inserted; either 
15 into a; second cbpy of the recipient osp gene or into a novel 
engineered osp gene .It is preferred that the osp-ipbd gene 
be placed under control of a regulated promoter. 

The user must choose a site in the candidate OSP gene 
for inserting a iobd gene fragment . The coats of most 
20 .bacteriophage are highly ordered. In such bacteriophage, it 
is .. important to retain in engineered OSP-IPBD fusion 
proteins those residues of the parental OSP that interact 
with pther proteins in the virion. For M13 . gVIII, we 
; \ } preferably retain, the entire mature protein, while for M13 
25 gill,, it might suffice to retain the last 100 residues 
(BASS90) "(or '] even fewier) Such a truncated gill protein 
. would be expressed in parallel with the complete gill 
protein, as . gill protein is .'required for phage infect ivity . 
The filamentous phage, which include M13 , f 1, fd, If 1 , 
30 . Ike, Xf , Pf 1, and Pf3, are of particular interest. " The 
major coat protein is encoded by gene VIII. The 50 amino 
acid mature gene VIII coat protein is synthesized as a 73 
amino acid precoat (IT0K79) . The first 23 .amino acids 
constitute a typical signal -sequence which causes the 
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nascent polypeptide to be inserted into the inner cell 
membrane. 

• An Ej. coli signal, peptidase (SP-I) recognizes amino, 
acids 18, 21, and 23, and, to a lesser extent, residue 22, 
5 and cuts between residues 23 and 24 of the precoat (KUHN85a, 
KUHN85b, OLIV87) . After removal of the signal sequence, the 
. amino terminus of the mature coat is located on the 
periplasmic side of the inner membrane; the carboxy terminus 
is on. the cytoplasmic side., About 3000 copies of the mature 
10 50 amino acid coat protein associate side -by- side in the 

inner membrane. 

The sequence of gene VIII is known, and the amino acid 
sequence can be encoded on a synthetic gene, using lacUV5 
promoter and used in conjunction with the LacI" repressor. 
15 The lacuvs promoter is induced by IPTG. Mature gene VIII 
protein makes up the sheath around the circular ssDNA. The 
3D structure of f 1 virion is known at medium resolution; .the 
amino terminus of gene VIII protein is on surface of the 
virion and is therefore a preferred atttachment site for the 
20 potential binding domain. A few modifications of gene 3£IU 
have been made and are discussed below. The 2D structure of 
M13 coat protein is implicit in the 3D structure. Mature 
M13 gene VIII protein has only one domain. 

We have constructed a tripartite gene comprising: - 
25 l) "DNA encoding a signal sequence directing secretion of 

. parts (2) and (3) through the inner membrane, 
2) DNA encoding the mature BPTI sequence, and 
.3) DNA encoding the mature M13 gVIII protein. 
This gene causes BPTI to appear in active form on the 
30 surface of M13 phage. .... 

The amino- acid sequence of M13 pre- coat (SCHA78) , 

called AA_seql, is 
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AA_seql - ' 
■ 1 1 2 | |2 3 3 4 4 5 

5 0 5. 0 \/5 ,0 5 0 5 0 

MKKSLVLKASVAVATLVPMLSFAAEGDDPAKAAFNSLQASATEYIGYAWA 

: V- 5 6 6 7 7 ' ■ 

5 0 5 0 3 ^ 
MWVIVGATIGIKLFKKFTSKAS 

The best site for inserting a novel protein domain into M13 
CP is after A23 because SP- I cleaves the precdat protein 
after A23, as indicated by the arrow. Proteins that can be 
secreted will appear connected to mature M13 CP at its amino 
terminus. Because the amino terminus of mature M13 CP is 
. located. oh the outer surface of the virion, " the introduced 
domain will be displayed on the outside of the virion! The 
uncertainty of the mechanism by which M13CP appears in the 
lipid bilayer raises the possibility that direct insertion 
of fecti into gene VXIX may not yield a functional fusion 
protein. It may be necessary to change the signal sequence 
of -the fusion to, for example, the phoA signal sequence 
(MKQSTIAIiALLPLLFTPVTKA. . . .'...) (MARK91) . Marks al 
(MARKS 6) showed. that the phoA signal peptide could direct 
mature BPTI to the JL coli periplasm. ~ 

;• Another vehicle .for displaying the IPBD is by 
expressing it as a domain of a chimeric gene containing part 
or all of gene JII. This gene encodes one of the" minor coat 
proteins of M13. Genes VI, VII, arid IX also encode minor 
coat proteins. Each of these minor proteins is present in 
about 5 copies per virion and is related to morphogenesis or 
infection. In contrast, the. major coat protein is present 
in more than 2500 copies per virion. : The gene VT, VII , and 
IX proteins are present at the ends of the virion; these 
three proteins are not pos t - 1 irans lat ionaliy processed 
(RASC86) . 



WO 92/15677 



PCI7US92/01456 



51 



The single -stranded circular phage DNA associates with 
about five copies of the gene III protein and is then 
extruded through the. patch of membrane -associated coat 
protein in such a way that the DNA is encased in a helical 
5 sheath of protein (WEBS78) . The DNA does not base pair 
(that would impose severe restrictions on the virus, genome) ; 
rather the bases intercalate with each other independent of 
sequence . 

Smith (SMIT85) and de la Cruz ££ al*. (DELA88) have 
10 shown, that insertions into gene ill cause novel protein 
domains to appear on the virion outer surface. The mini- 
protein's gene may be fused to gene HI at the site used by 
Smith and by de la Cruz ££. aJU, at a codon corresponding to 
another domain boundary or to a surface loop of the protein, 
15 or to the amino terminus of the mature protein. 

All published works use a vector containing a- single 
modified gene III of fid. Thus, all five copies of gill are 
identically modified. Gene HI is quite large (1272 b. p. or 
about 20% of the phage genome) and it is uncertain whether 
20 a duplicate of the whole gene can be stably inserted into 
the phage. Furthermore, all five copies of gill protein-are 
at one end of the virion. When bivalent target molecules 
(such as antibodies) bind a pentavalent phage, the resulting 
complex may be irreversible. Irreversible binding of the GP 
25 to the target greatly interferes with affinity enrichment of 
the GPs that carry the genetic sequences encoding the novel 
polypeptide having the highest affinity for the target. 

To reduce the likelihood of formation of irreversible 
complexes, we may use a second, synthetic gene that encodes 
30 carboxy- terminal parts of HI; the carboxy- terminal parts of 
the gene III protein cause it to assemble into the phage. 
For example, the final 29 residues (starting with the 
arginine specified by codon 398) may be enough. to cause a 
fusion protein to assemble into the phage. Alternatively, 
35 one might include the final globular domain of mature gill 
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protein , viz. the final 150 to' 160 amino acids ;ot gene III 
(BASS9 0) . We might, for example, engineer a gene, .that 
consists of (from 5 1 to 3 1 ) : / :;: 

1) a promoter (preferably regulated) , 
Si 2) a ribosome^ binding site, 

3) an initiation codon, ' ^ ' 

4) a functional signal peptide 'directing secretion of 
v parts (5) and (6) through the inner membrane , • 

5} DNA encoding an IPBD, 
10 6) DNA encoding residues 275 through 424 of M13 gill 

/ protein,/ 

~ : 7) a translation stop codon, and ^ 

8) (optionally) a transcription stop. signal. 
We leave the wild- type gene III so that some unaltered gene 
15 ; III protein will be present . Alternatively, we may use gene 
VIII protein as the OSP and regulate the osp: ;iobd fusion so 
that only one or a few copies of the fusion protein appear 
on the phage. 

M13 gene VI, VII, and IX proteins are not processed 
20 • ... after translation. The route by which these proteins are 
assembled into the phage have not' been reported; These 
proteins are necessary . for normal morphogenesis and 
infectivity of - the phage . Whether these molecules (gene VI 
protein, gene VII protein, and gene IX proteinl attach 
25 themselves- to the phage: a) from the cytoplasm, b) from the 
periplasm, or c) from within the lipid bilayer, is not 
known. One could use any of these proteins to introduce an 
IPBD onto the phage surface by one of the constructions:. 
1) ipbd : :pmgp, 

3,0 . ; 2) pmcp : : ipbd , c : • . \ f •' 

3) signal : : ipbd : :pmcp, and * . - 

. 4) signal : : pmcp : : j,pbd . • 

where ipbd represents DNA coding on expression -for the 
initial potential binding domain; pmcp represents DNA coding 
35 for one of the phage minor coat proteins, VI, VII, and IX; 
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signal represents. a functional secretion signal peptide, 
such as the phoA s ignal (MKQSTIALALLPLLPTPVTKA) ; and ' » : : " 
represents in- frame genetic fusion. The indicated fusions 
are placed downstream of a known promoter , ' preferably a 
regulated promoter such as IacIE£5., tac, or tip.. Fusions (1) 
and (2) are appropriate when the minor coat protein attaches 
to the phage from the cytoplasm or by autonomous insertion 
into. the lipid bilayer. Fusion (1) is appropriate if the 
amino terminus of the minor coat protein is free and (2) is 
appropriate if the carboxy terminus is free. Fusions (3) 
and (4) are appropriate if the minor coaf protein attaches 
to the phage from the periplasm or from within the lipid 
bilayer. Fusion (3) is appropriate if the amino terminus of 
the minor coat protein is. free and (4) is appropriate if the 
15 carboxy terminus is .free. 

Similar, constructions could be made with • other, 
filamentous phage. Pf3 is a well known filamentous phage 
that infects Paeudomonas aeruginosa cells that harbor ^ an 
IncP-1 plasmid. The maj or coat protein of PF3 is unusual in 
having no signal peptide to direct its secretion. The 
sequence has charged residues ASP, , ARG37, LYS40, and PHB M -GOO- 
which is consistent with the amino terminus being exposed. 
•Thus, to cause an IPBD to appear on the surface of Pf3, we 
construct a tripartite gene comprising: 
.1) a signal sequence, .known to cause' secretion - in Es. 
aeruaenosa . (preferably known to cause secretion of 
IPBD) .fused- in- frame to, 

2) a gene fragment encoding the IPBD sequence/ fused in- 
frame to, 

3) DNA encoding the mature Pf3 coat protein. 
Optionally, DNA encoding a flexible linker of one to 10 
amino acids and/or amino acids forming a recognition site 
for a specif ic protease .{e.g., Factor Xa) is introduced 
between the. ipbd gene fragment and the Pf3 coat-protein 
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: gene. This tripartite gene is ; introduced 1 into Pf 3, so that 
it does not interfere with expression of any Pf3 genes. To 
reduce the possibility of genetic recombination, part ; (3) is 
designed to have numerous silent mutations relative to the 
5 wild -type gene. Once the signal sequence is cleaved off, 
the IPBD. is in the periplasm and the mature coat protein. 
, acts as an anchor and. phage -assembly signal . ^Ct does not . 
; matter that this fusion' protein comes to rest in- the lipid 
bilayer by a route different from the route followed by the 
10 wild -type coat protein. . 

As described in : W09 0/02 8 09, other phage, such as 
bacteriophage *X174 , large DNA phage such as X or T4 , and 
. even . RNA phage, ; may with", suitable adaptations and 
< :. modifications be used as/GPs. 
15 TV .C. Bacterial Cells as Genetic Packages: 

...One may choose any we'll -characterized bacterial strain 
which (1) may be grown in culture (2) may : be engineered to 
display PBDs on its surface, and (3) is compatible, with 
affinity selection.- . . ■ '. ' ; . • '- '• 

20 Among bacterial cells, the .preferred genetic packages 

are Salmonella tvphimurium . Bacillus subtilis Pseudpmpnas 
aeruginosa : Vibrio cholerae. Klebsiella pneumppia, Neisseria 
gonorrhoeae . Neisseria meningitidis. Bact<?rpi<fes 2Q&Q2H3., 
Moraxella bovis , and especially Escherichia cpli. The 
25 potential binding mini-prpteinlmay be expressed as an. insert 
in a chimeric bacterial, outer surface protein (OSP) . All 
■bacteria exhibit proteins on their outer surfaces.. R*. coli 
is the preferred bacterial GP and, for it, LamB is. a 
preferred. OSP. ; 

30 while most bacterial proteins remain in the cytoplasm, 

others are transported to the periplasmic space (which lies 
between the. plasma membrane and the cell- wall of gram- 
negative bacteria) , or. are conveyed and anchored to. the 
outer surface of the cell. "Still others are. exported 

35 (secreted) into the medium surrounding the cell. Those 
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characteristics of a . protein that are recognized by a cell 
and that cause it to be transported out of the cytoplasm and 
displayed on the cell surface will be termed "outer -surface 

transport signals?. " , 

5 . Gram-negative bacteria have outer ^membrane proteins 

(OMP), that form, a subset of OSPs. Many OMPs .span the 
membrane one or more times. The signals that cause OMPs to 
localize in the outer membrane are encoded in the amino acid 
sequence of the mature protein. Outer membrane proteins of 
10 bacteria are initially expressed in a, precursor, form 
including a so.called signal peptide . The precursor protein 
is transported to the inner membrane, and the signal peptide 
moiety is extruded into the periplastic space. There, it is 
cleaved off by a "signal peptidase", and the remaining 
15 -mature" protein can now enter the periplasm. Once there, 
other cellular mechanisms recognize structures in the mature 
protein which indicate that its proper place is on the outer 
membrane, and transport it to that location. 

It is well known that the DNA coding for the leader or 
20 signal peptide from one protein may be attached to the DNA 
" sequence coding for another protein, protein X, to form a 
chimeric gene whose expression causes protein X to appear 
free in the periplasm. The use of export -permissive 
bacterial strains (LISS85, STAD89) increases the probability 
25- that .. a signal- sequence- fusion will direct the desired 
protein to the cell surface. 

OSP-IPBD fusion proteins need not fill a structural 
role in the outer membranes of Gram- negative. ■ bacteria 
because parts of the outer membranes are not highly, ordered. 
30 F or large OSPs there is likely to be one or more sites at 
which QMS. can be truncated and fused to ipbji such that cells 
expressing the . fusion will display IPBDs on the cell 
• surface,- Fusions of . fragments of flmp. genes with fragments 
• . , . of . a n x. gene have led to X appearing on the outer membrane 
35 (CHAR88b,c, BENS 8 4 , CLEMS 1) . When such fusions: have been 
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made, we can design an os p-ipbd gene by substituting ipbd 
for 2C in the DNA sequence. Otherwise,' a successful .OMP- IPBD 
fusion is preferably sought by fusing fragments of the best 
- cpmp to an i pbd . expressing the fused gene, arid testing the 
5 , resultant GPs for display- of -IPBD phenotype. We use the 
available data about the OMP to pick the point or points of 
fusion between omp and ipbd to maximize the likelihood that 
. IPBD will be displayed • (Spacer DNA encoding flexible 
linkers, made/ e.g. , of GLY, SER; and ASN, may be placed 
10 between the fispt-. and ipM- derived fragments to facilitate 
display-) Alternatively, we truncate osp at several sites 
or in a manner that produces osp fragments of variable 
r ; length and fuse the osp fragments to ic^; cells expressing 
the. fusion are screened or selected which display IPBDs on 
,15 / the. cell surface/ Freudl s£ al J (FREU89) have shown that 
fragments of OSPs (such as OmpA) above a certain size are 
incorporated into the outer ' membrane . An additional 
alternative is to include short segments of random DNA in 
f the fusion . of omp fragments to igM and then" screen or 
20 . select the resulting variegated population 1 for members 
: 'exhibiting; the display- of -IPBD phenotype. 

\, In E. coll . the LamB protein is a well understood OSP 
and can be used. The EL; coli LamB has been expressed in 
functional form in S tyohimurium V V. cholerae. an d K. pneu- 
25 monia , so that one could display a population of PBDs in any 
. of these species as a fusion to JL jEQli LamB. K. pneumonia 
expresses a maltoporin similar to LaunB (WEHM89) which could 
also be used. - In aeruginosa ; the Dl protein (a 

homologue of LamB) can be used (TRIA88) . ' 
30 , LamB is transported to the outer membrane if a 

functional N-terminai sequence is present; further, the 
.first 49 amino acids of the mature sequence are required for 
, successful transport (BENS 84) . As with other OSPs, LamB of 
E , . coli is synthesized with a typical- signal - sequence which 
35 is subsequently removed. Homology between parts of LamB ■ 
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protein and other outer membrane proteins OmpC, OmpF, and 
PhoB has been detected (NIKA84) , including homology between 
LamB amino acids 39-49 and sequences of the other proteins. 
These, subsequences may label the proteins f or transport to 

the outer membrane . 

The amino acid sequence of LamB is known (CLEM81) , and 
a model has. been developed of how it anchors itself to the 
outer membrane. (Reviewed by, among others, BENZ88b) . The 
location of its -maltose and phage binding domains are also 
known (HEIN88) . Using this information, one may identify 
several strategies by which a PBD insert may be incorporated 
into LamB to provide a chimeric OSP which displays the PBD 
on the bacterial outer membrane . 

, when the.PBDs are. to be displayed by a chimeric trans- 
membrane protein like LamB, the PBD could be inserted into 
a loop normally' found on the surface of the cell- (SXU 
BECK83, MANO86) - Alternatively, we may fuse a 5 ' segment of 
• the Q3S. gene to the iBbj* gene fragment; the point of fusion 
is picked to correspond to a surf ace -exposed loop of the OSP 
: and, the carboxy terminal portions of the OSP are "omitted. 
In LamB, it has been found that up to 60 amino acids may be 
inserted (CHAR88b,c) with display of the foreign epitope 
resulting; the structural features of OmpC, OmpA, OmpF, and 
PhoE are so similar that one expects similar behavior from 

25v •". these proteins. 

It -should be noted that while LamB niay be characterized 
as a binding protein, it is used in the present invention to 
provide an OSTS; its binding domains are hot variegated. 

Other bacterial outer surface proteins,' such as OmpA, 
30 OmpC, OmpF, PhoE, and pilin, may be used in place of LamB 
■ and its homologues. OmpA is of particular interest because 
- it is very abundant and because homologues are known in a 
- wide variety of gram-negative bacteria! species. Baker £t 
,.. al... (BAKE87) review , assembly of proteins into the outer 
35 membrane of col'i and cite a topological, model of OmpA 
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: (VOGE86) that predicts that res idues 19-32, 62 - 73 , 105-118, 
and. 147- 158 -are exposed.on the cell surface.- Insertion of 
.... a ipbd - encoding fragment at about codon; ill or. at about 
codon 152 is likely to cause the IPBD to be displayed on the 
5 . cell surface.. Concerning OmpA, see also MACI88 and MAN088. 
Porin Protein F of Paeudomdnas aeruginosa has • been cloned 
and has sequence homology to OmpA of JL. coli (DUCH88) 
'";" 4 ->d^ou#n"'tffiis "hcwblogy'is*" ziot'suf f "icieht'to" allow prediction 
of surface -exposed residues on Porin Protein F, the- methods: 
10 used to determine the " topological modei of OmpA may. be 
applied to Porin Protein F. Works related to use of OmpA as 
an OSP include BECK80 and MACI88. 

Misra • and Benson .- (MISR8 8a, MISR88b) - disclose a 
topological model of EL. cbli OmpC ■ that predicts that , among 
15 others,, residues GLY 164 and. LEUjjo are exposed on the. cell 
• surface. Thus insertion of an ipfed. gene fragment, at about. 

codon 164 or at about codon 250 of the E^ C£li;cjnE£ gene or 
. at corresponding codons of the t--Ynhimurium ampS gene is 
likely to cause IPBD to appear oh the cell surface. The, 
20 omoC genes of other bacterial species may, be used. Other 
works related to OmpC include CATR87 and CLIC88. 

OmpF of E. 'cbli is a very abundant OSP, aslO 4 copies/ 
cell. ; Pages ai^. (PAGE90) have published a model of OmpF 
-indicating seven surface -e^osed segments. Fusion of an 
25 ipbd gene fragment , either as an insert or to -replace the 3 1 
part of ompF . in one of the indicated regions is' likely to 
produce a functional ompF: :iobd gene the expression of which 
. - leads to display of IPBD on the cell surface. . In 
particular, fusion- at about codon 111, 177, 217, or 245 
30 ;. should lead to a functional omoF: : ipbd -gene . , Concerning 
OmpF, see also REiD88b, PAGE88, BENS88, TOMM82, and SODE85. 
Pilus proteins are of particular interest , because 
; piliated cells express many copies of these proteins and 
because several species (*L_ gonorrhoeae.- P^ .aeruginosa, 
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Mo-reralla bovifl, Bf^teroides no&QS&l. and E_ sail) express 
related pilins-. Getzoff. and coworkers (GETZ88 , PARG87 , 
SOME85) have constructed a model of the gonococcal pxlus 
that predicts that the protein forms a four-helix bundle 
having structural similarities to tobacco mosaic virus 
protein and myohemerythrin . On this model, both the- amino 
and carboxy termini- of the protein are exposed. The amino 
terminus is methylated. Elleman ( ELLE 88) has reviewed 
pilins of p^^rnides HQd^sjia and other species and serotype 
differences can be related to differences in the pxlin ■ 
protein and that most variation occurs in the C- terminal 
region. The amino- terminal portions of the pilin protein 
are highly conserved . Jennings fit ( JENN89 ) have grafted 
a fragment- of foot-and-mouth disease virus (residues 144- 
15 159) into the ^.nflsaaajia type 4 f imbrial protein which is 
highly homologous to gonococcal pilin. They found that 
expression of the 3 • -terminal fusion in p » nwiaOgft led to 
a viable strain that makes detectable amounts of the fusion 
protein. Jennings ej: aJU did not vary the foreign epitope 
nor did they suggest any variation. They inserted a GLY-GLY 
linker between the last pilin residue and the first residue 
of the foreign epitope to provide a "flexible linker" . Thus 
a preferred place to attach an IPBD is the carboxy terminus. 
The exposed- loops of the bundle could also be used, although 
the particular internal fusions tested by Jennings s£ 
(JENN89) appeared to be lethal in L. aejaiainosar. Concerning 
pilin, see also MCKE85 and 0RND85. 
- judd (JUDD86, JUDD85) has' investigated Protein IA of JL. 
^nrrhoeae and found that the amino terminus is exposed; 

* thus, one could attach an IPBD at or near the amino terminus 
of the mature P. IA as a means to display the IPBD on the IL. 

• g9T?9 rrh °eae surface. 

A model , of the topology of PhoE of E_ fifili has been 
disclosed by van der Ley fit (VAND86) . This model 

35 predicts eight loops that are exposed; ' insertion of an IPBD 
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into one of these loops is likely to lead to display, of the 
IPBD on the surface of the cell. Residues 158,. 2.01,. 238, 

• . and 27.5 are preferred locations for insertion of, and IPBD . 
. Other - OSPs that could be used include £*. SS2A BtuB, 

5 ■ ■ FepA, . FhuA, IutA, FecA, and FhuE (GUDM89) which are 
.receptors, for nutrients usually found in low abundance. The 
genes of ' all these proteins have been sequenced, , but 
' topological models. are not yet available. Gudmunsdottir et 
al. (GUDM89) have begun the construction of such a model for 
10 BtuB and FepA by showing that certain residues bf BtuB face 
the periplasm and by determining the functionality of 
. various BtuB:: FepA fusions . Carmel gt al^ .(CARM90) have 
. reported .work of a similar nature for FhuA. All Neisseria 
.species express outer surface proteins for iron transport 
15 that have been identified and, in many cases , cloned . See 
also MORS87 and MORS88. , : • 

Many gram-negative bacteria express one, or more 
■ phospholipases. : soli phospholipase A, product of the 
£Ldi gene, has been cloned and sequenced by. de fieus .fit al^ 
20 (DEGE84) . They found that the protein appears at the cell 
surface without, any posttranslational processing. A ipb£. 
gene fragment can be attached at either terminus or inserted 
. . at positions predicted to encode loops in the protein . That 

• : phospholipase A arrives oh the outer surf ace without removal 
25 . of a signal sequence does not prove that a PldA: :IPBD fusion 

protein will also follow this route. Thus, we might cause a 
PldA: : IPBD or IPBD: : PldA fusion to be: secreted into the 
periplasm by addition of an appropriate signal sequence. 
Thus/ in addition to simple binary fusion of an isbjl 
30 fragment to one terminus of pldA, ; the constructions: 

1) fit: s ipbd : :pldA 

2) S3.'. :£ldA: :jpbd . ; 
should be tested. Once the PldA: : IPBD protein is free in 
the periplasm it does not remember how it got, there and the 

35 structural features of PldA that cause it to localize on the 
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outer ... surface, will direct the fusion to the same 
destination. . . 

T y n . Bacf *»-ri ai spq ^ b as Genet i r. Packages ; 

■ Bacterial spores have desirable properties as GP candi- 
dates. Spores are much more resistant than vegetative 
bacterial cells or. phage to chemical and physical agents, 
and hence permit the use of a great variety of affinity 
selection conditions. Also, fiasiillifi spores " neither 
actively metabolize nor alter the proteins on their surface. 
Bacillus spores, and more especially fii. ftvfrttUfi spores, are 
therefore the preferred sporoidal GPs . As discussed more 
fully in WO90/.02809, a foreign binding domain may be 
introduced . into an outer surface protein such- as that 
encoded by .the mibtilis cotC or cotD genes. 

. it is generally preferable to use as the genetxc 
package a cell, spore or virus for which an outer surface- 
protein which can be engineered to display a IPBD has 
already been identified. However, as explained in 
WO90/02809, the present invention is not limited to such 
genetic. packages, as an outer surface transport signal may 
be generated by variegation- and- selection techniques. 
V.E Genetic Construction and Expression Considerations 

The -M^bd-oso gene may be: a) completely synthetic, b) 
a composite of natural and synthetic DNA, or c) a composite 
25 of natural DNA fragments. The important point is that the 
- EM segment be easily variegated so as to encode .a 
multitudinous and diverse family of PBDs as previously 
described. Asynthetic ipM segment is preferred because it 
allows greatest control over placement of restriction sites. 
30 Primers complementary to regions abutting the oflp-ipbd; gene 
on its 3' flank and to parts of the qsp-ipba gene that are 
not to be varied are needed for sequencing. 

The sequences of regulatory parts of the gene are taken 
from the sequences of natural regulatory elements: a) 
promoters, b) Shine-Dalgamo sequences, and c) trans- 
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criptional terminators* Regulatory elements could also be 
designed from knowledge of consensus sequences of natural 
regulatory regions/ The sequences of these regulatory 
elements are connected to the coding regions ; restriction 
5 sites are, also inserted in or adjacent to the regulatory 
regions to allow convenient manipulation. 

>• - The essential function of the affinity separation is to 
separate GPs that bear PBDs (derived from IPBD) haying high 
affinity . for .the target from GPs bearing PBDs, having low 

10 affinity for the target. . If the elution volume of a GP 
depends. on the number of PBDs on the GP surface, then a GP 
bearing many PBDs. with low af f inity > GP (PBD W ) , might co- 
elute .with . a GP bearing fewer PBDs with high affinity, 
GP (PBD t ) . Regulation of the bs p-pbd gene preferably is such 

15 that most packages display sufficient PBD to effect a good, 
separation according to affinity. Use of a regulatable 
promoter to control the level of expression of the osp-pbd 
, allows fine, adjustment of the chromatographic behavior of 
the variegated population. 

20 ■ , - - Induction of synthesis of engineered genes in 
vegetative bacterial cells has been exercised through the 
use of regulated promoters such as lacUVS . trpP , or tac 
(MANI82) . The factors that regulate the quantity of protein 
synthesized are sufficiently well understood that- a wide 

25 variety of heterologous proteins can now be produced in X. 
. coll > B. subtil is and other host cell s~ in at least moderate 
quantities (BETT88) • Preferably, the promoter for the bsp- 
ipbd gene is subject to regulation by, a small chemical 
- , inducer. For example, the lac promoter and the hybrid fcrp - 

30 lac ( tac ) promoter are regulatable with isopropyl 
thiogalactoside (IPTG) . The promoter for the constructed 
gene heed not come from a natural osp gene; any regulatable 
bacterial promoter can be used. A non-leaky promoter is 
■ ' preferred. ' 
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The present invention is not limited to a single method 
of gene design. The ojap^icfed gene need not be synthesized 
in ifitfl; parts of the gene may be obtained from nature. 
One may use any genetic engineering method to .produce the. 
5 correct gene fusion, so long as one can easily and 
accurately direct mutations to specific sites in the pbji DNA 
subsequence . 

The coding portions of genes to be synthesized are 
designed at the protein level and then encoded in DNA. The 
10 ambiguity in the genetic code is exploited to allow optimal 
placement, of restriction sites, to create .various 
distributions of amino acids at variegated .codons, to 
minimize the; potential for recombination, and to reduce use 
. of .codons are poorly translated in the host cell. 
15 V.F Structural Considerations 

The design of the amino-acid sequence for the. iBfed.-SSP. 
gene to encode involves a number of structural 
considerations, the design is somewhat different for each 
type of GP. in bacteria, OSPs are not essential, so there 
20 is no requirement that .the OSP domain of a fusion have -any 
of its parental functions beyond lodging in the outer 

-membrane. *, . 

It is desirable that the OSP not constrain r the 
orientation of the PBD domain; this is not to be confused 
25 with lack of constraint within the PBD. Cwirla fit 

■ (CWIR90) , Scott and Smith (SCOT90) , and Devlin . fit sJU 
(DEVL90) , have taught that variable residues in phage- 
displayed random peptides should be free of influence from 
the phage OSP. We teach that binding domains having a 
30 moderate to high degree of conformational constraint will 
exhibit higher specificity and that higher affinity is also 
possible; Thus, we prescribe picking codons for variegatxon 
.. that specify amino acids that will appear- in a well-defined 
framework The nature of the side groups is varied, through 
35 a very wide range due to the combinatorial replacement of 
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imiltiple airiino acids . The main chain conformations of most 
PBDs of a given class is very similar. The movement of the 
PBD relative to the OSP should not, however;- be restricted. 
Thus it is often appropriate to include a flexible linker 
5 between the PBD and the , OSP . Such flexible 1 inkers can . be . 
taken *: from naturally occurring proteins/ known to* have 
'flexible regions . - For example, the gill protein r of M13 
contains glycine - rich regions thought to'allow^ the,:amin6- 
terminal ^domains a high degree of freedom. Such flexible 
10 linkers may; also, be designed. Segments of polypeptides that 
are ' rich in the amino acids <3LY, ASN, SER, and ASP; are 
-likely to ; give rise to flexibility. - Multiple glycines are 
/ particularly preferred. '". , ' ; ; 7 

When we choose to insert the PBD into a surface loop of 
15 an OSP such as LamB, OmpA, or M13 gill protein, -there are a. 
few considerations that do not arise when PBD is joined to 
the end of an OSP. In these cases, the OSP exerts some 
* constraining influence on the PBD; the ends of the PBD. are 

held in more or less fixed positions. We could insert a 
20 highly varied DNA sequence into the oso gene at codons that 
encode a surface- exposed loop and select -for cells that have 
' a specific-binding phenotype. When the identified amino- 
' _ acid sequence is. synthesized (by any means) , the constraint 
' : of the OSP is lost , and the .peptide is likely to have a much 
■25 : lower affinity for the target and a much lower specificity. 

Tan and Kaiser (TANN77) found that a synthetic model of BPTI 
containing all the amino acids of BPTI that cohtaict trypsin 
■ has a K d for trypsin ••lO 7 higher than BPTI . Thus, it is 
strongly preferred that the varied amino acids be part of a 
30 PBD in which the structural constrains are supplied by the 

. " ,; - ; PBD - ■ . - ' ' ' ■ ' ' . ' ■*""■' : " • ■ ; " 

It is known that the : amino acids adjoining; foreign 

epitopes inserted into LamB influence the - immunological 

: properties of, these epitopes (VAND90) . We expect that. PBDs 
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- inserted into loops .of LamB, OmpA, or similar OSPs will be 
influenced by the amino acids of the loop and by the OSP in 
general . To obtain appropriate, display of the PBD, it may 
be necessary to add one or. more linker amino acids between 

5 the OSP and the. PBD. Such linkers may be taken from natural 
proteins or designed on the basis of our knowledge of the, 

- structural behavior of amino acids. Sequences rich m GLY, 
SER, ASN, ASP, ARC," and THR are appropriate. One to five 
amino acids at either junction are likely to impart the 

10" desired degree of flexibility between the OSP and the PBD. 

A preferred site for insertion of the iBbji gene into 
the phage asp. gene is one in which: a) the IPBD folds into 
its original shape, b) the OSP domains fold into- their 
V original shapes, and c) there is no interference between the 

15 two domains- ( 

If there is a model of the phage that • indicates that, 
either the amino or carboxy terminus of an OSP is exposed, to 
solvent, then the exposed terminus of that mature OSP 
• becomes the prime candidate for insertion of the iE&H gene . . 
20 A -low resolution 3D model suffices. 

in the absence Of a 3D structure, the amino and carboxy 
termini of the mature OSP are the best candidates .for 
insertion of the ipfed gene. A functional fusion may require 
additional residues between the IPBD and OSP domains, to 
25 avoid unwanted interactions between the domains. Random- 
sequence DNA or DNA coding for a specific sequence of, ; a 
protein homologous to the IPBD or OSP, can be- inserted 
between the flap, fragment and the ipM fragment if needed. 
■ Fusion at a domain boundary within the OSP is also a 
30 good approach for obtaining a functional fusion. Smith 
exploited such a boundary when subcloning heterologous DNA 
into gene III of fl (SMIT85) . 

The -criteria for identifying OSP domains suitable for 
causing display of an IPBD are somewhat different from those 
35 used to identify and IPBD. When identifying an OSP, minimal 
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size, lis not so important because the. ; OSP domain will not 
appear in the final binding molecule nor will we need to 
synthesize the gene repeatedly in each variegation round. 
The major design concerns are that: a) the OSP::IPBD fusion 
5 causes display of IPBD, b) the initial genetic construction 
.. be reasonably convenient, and c) the osp: :ipbd gene be 
'genetically stable and. easily manipulated. There are 
several "methods of identifying domains . Methods that rely 
on atomic coordinates have been reviewed , by Janin and 
10 Chothia (JANI85) . These methods use matrices of distances 
between a carbons (CJ , dividing planes fcf > ROSE85) , or 
buried surface (RASH84) . Chothia and collaborators have 
correlated the .behavior of many natural proteins with domain 
structure (according to their def inition) . Rashin correctly 
15 predicted the stability of a domain comprising residues 206- 
316 of thermolysin (VITA84, RASH84) . 

Many , researchers have used partial proteolysis and 
V protein sequence analysis to isolate and identify stable 
domains. (See/ for example, VITA84, POTE83, SC0T87a r and 
20 PAB079 . ) Pabo al. used calorimetry as an indicator that 
the cl repressor from the coiiphage X contains two domains ; 
. they, then used partial proteolysis to determine the location 
of the domain boundary. 

If the- only structural . information available is the 
25: amino acid sequence of the candidate, OSP > ye can use the 
sequence to predict turns and loops. There is a high 
probability that some of the .loops "and turns will be 
correctly predicted ( cf . Chou- and Fasman, (CH0U74) ) ; these 
locations are also candidates for insertion of the igbS gene 
30 .'fragment.- ^ : ./.., ' . 

In bacterial OSPs,. the niaj or considerations are: a) 
that the PBD is displayed, and b). that the chimeric protein 
* not be toxic. r 
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From topological models, of OSPs, we can determine 
whether the amino or carboxy termini of the OS P is exposed. 
If so, then these are excellent choices for fusion of the 
osp fragment to the ipbd fragment. 

The lamB gene has been sequenced and is available on a 
variety of plasmids (CLEM81, . CHAR88a,b) . Numerous fusions 
of fragments of laml with a variety of other genes have been 
used to study export of proteins in Z^ soll. From various 
studies, Charbit 'st ^L; (CHAR88a,b) have proposed a model 
that specifies which residues of LamB are: a) embedded in 
the membrane, b) facing the periplasm, and c) facing the 
cell surface; we adopt the numbering of this model for. amino 
acids in the mature protein. According to this model, 
several loops on the outer surface are defined, including: 
15 i) residues 88 through 111, 2) residues 145 through 165, and 
3) 236 through 251. - 

Consider a mini -protein embedded in LamB. For example, 
insertion of DNA encoding GjNXCXjXXXCXwSG,, between codons 153 
and 154 of lamB is likely to lead to a wide variety of LamB 
20 derivatives being expressed on the surface of E^.-Sfili cells. 
Gl , N 2 , S„, and G 12 are supplied to allow the mini -protein 
sufficient or ientational freedom that is can interact 
optimally with the target. Using affinity enrichment 
(involving, for example, FAGS via a f luorescently labeled 
25 target,' perhaps through several rounds of enrichment) , we 
might obtain a strain (named, for example,. BEST) that 
expresses a particular LamB derivative that shows high 
affinity for the predetermined target. An octapeptide 
having the sequence of the inserted residues 3 through 10 
from BEST is likely to have an affinity and specificity 
similar to that observed in BEST because the octapeptide has 
an internal structure that keeps the amino acids in a 
conformation that is quite similar in the LamB derivative 
and in the isolated mini -protein. 
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, Fusing one or more new domains to a protein may make 
the ability of the new protein to be exported from the cell 
different from the ability of the parental protein. . The 
signal peptide of the wild-type- coat, protein may, function 
5 for authentic polypeptide but be unable, to direct export of 
a fusion. To utilize the Sec -dependent pathway, one may . 
-'. need a . different signal, peptide. Thus to express and 
'dispiaf a chimeric BPTI/Ml3-gehe VIII protein, we found it 
necessary: tp utilize a heterologous signal peptide (that of 

10 phoA ) . ... "• :•' ' ' - 

. gps that display peptides having high affinity for the 
target may be quite difficult to elute from the target, 
' '• particularly a multivalent: target, (Bacteria -that are; bound 
V . verjr tightly can simply multiply in aiJtli. )•: For phage, one 
can introduce a cleavage site for a specific protease, such 
as blood- clotting Factor Xa, into the fusion OSP protein so 
that the binding domain can be cleaved from the 9enetxc 
package. Such cleavage, has the advantage: that all. resulting 
phage have identical OSPs and therefore are equally 
infective, even if polypeptide- displaying, phage can be 
eluted from the affinity matrix without cleavage. This; step 
allows recovery of valuable genes which might, otherwise be 
lost. To our knowledge, no one has disclosed or suggested 
using a specif ic protease as . a : means to recover an 
inf ormat ion- containing genetic package or of converting a 
population of phage that vary in inf ectivity .. into phage 
having identical inf ectivity. ^, . . 

-tv.6, Syntrhesia > >f fl^ne Inserts . 
r ' The present invention is not .limited to any particular 
method or. strategy of DNA synthesis, or construction. 
Conventional DNA synthesizers may be used, with appropriate 
reagent modifications for production of variegated DNA 
(similar to that now used for production of mixed probes) . 

The o« p-pbd gene s may be • created by inserting vgDNA 
into an existing parental gene, such as the osp-ipbd shown 
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to be displayable by a suitably transformed GP. The present 
invention is not limited to any particular method of 
introducing the vgDNA, e.g. ,. cassette mutagenesis or single- 
stranded-oligonucleotide-directed mutagenesis 
TV: Hi Qnp'T-atiive nionina Vector ' 

The operative cloning vector (OCV) is a replicable 
nucleic acid used to introduce the chimeric iobji-ojsp. or 
Ipbji-fiSE gene into the genetic package. When the genetic 
package is a virus, it may serve as its own OCV. For cells 
and spores, the OCV may be a plasmid, a virus, a phagemid, 
or a chromosome. 
TV-T. Tr ^nHfnrmation of cells; 

When the GP is a cell, the population of GPs is created 
' by transforming the cells with suitable OCVs. When -the GP 
15 is a phage, the. phage are genetically engineered and then 
transfected into host cells suitable for amplification. 
When the GP is a spore, cells capable of sporulation are 
transformed with the OCV while in a normal metabolic state, 
and then sporulation is induced so as to cause the OSP-PBDs 
20 to be displayed. The present invention is not limited to 
any one method of transforming cells with DMA. 

The transformed cells are grown first under non- 
selective conditions that allow expression of plasmid genes 
and then selected to kill ^transformed cells. Transformed 
25 cells are then induced to express the psp-pbd gene at the 
appropriate level of induction! The GPs carrying the IPBD 
or PBDs are then harvested by methods appropriate to the GP 
at hand, generally, centrif ugation . to pellet ize GPs and 
resuspension of the pellets, in sterile medium (cells) or 
buffer (spores or phage).. They are then ready for 
verification that the display strategy was successful (where 
the GPs all display a "test" IPBD) or for affinity selection 
(where the GPs display a variety of different PBDs) . 
tv_ if. verification nf Display Strategy; 
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The harvested packages are tested to determine whether 
the IPBD is present oh the surface. In any tests of GPs for 
the presence of IPBD on. \ the GP surface; any ions or 
cof actors known to be essential for the stability of IPBD or 
5 AfM(IPBD) are included at appropriate levels. The tests can 
. be done, e.g. , by a) by affinity labeling/ b) enzymatically, 
c) spectrophotometrically, d) by affinity separation, or e) 
by af finity precipitation . The; Af MTlPBD ) in this "step is 
one. picked to have strong affinity (preferably, 
10 K d < 10" 11 M) for the IPBD molecule and little or no affinity 

for the wtGP. • 

y; AFFINITY SELECTION OF TARGET -BINDING MUTANTS 

V.A. Affinity Separation Technology. Generally ; 
; - Affinity separation is used initially in the present 

15* invention to . verify that the display system is working, 
i.e; , that a chimeric outer surface protein has been 
expressed and transported to the surface of the genetic 
package and is oriented so that the inserted binding domain 
. ; : is . accessible to target material . When used for this 
20 purpose, the binding domain is a known binding domain for a 
particular target and that target is the affinity molecule 
used in the affinity separation process . For example, a 
display system may be validated by using inserting DNA 
encoding . BPTI into a gene encoding an outer surface protein 
25 of the genetic package of interest, and testing for binding 
to anhydro trypsin, which is normally bound by BPTI. 

If the genetic packages bind to the target, then we 
. have confirmation that the corresponding binding domain is 
indeed displayed by the genetic package Packages which 
30 . display the! binding, domain (and thereby bind the target) 
are separated from those which do not. 

Once the display system is validated/ it is possible to 
use a variegated , population of genetic packages which 
display a variety of different potential binding domains, 
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. and use affinity separation technology to determine how well 
■they bind to one. or more targets . This* target need not be 
one bound by a known binding domain which is parental to the 
displayed binding. domains, i^., one may select for binding 

to a new. target. 

The term "affinity separation means" includes, but is 
not limited to: a) affinity column "chromatography, b) batch 
elution from an affinity .matrix material, :'c) batch elution 
from an affinity material attached to a plate, d) fluores- 
cence activated cell sorting, and e). electrophoresis in the 
presence of target material. "Affinity material" is used to 
mean a material with affinity for the material to be 
purif ied, called the . "analy te " . \ In most cases, the 
, association of the affinity material and -the analy te is 
reversible so . that the analyte can be freed from trie 
affinity material once the impurities are- washed away . 
. y". a. .Affinity chromatography, generally 

. , Affinity column chromatography, batch elution from an 
affinity matrix material held in some container, and batch 
elution from a plate are very similar and' hereinafter will 
be treated under "affinity chromatography." 

If. affinity chromatography is to be used, then: 
.1) the molecules of the target material must be of 
sufficient size, and chemical reactivity to be applied 
25 : to. a solid support suitable for affinity separation, 

. .; . ,,; . 2) after application to a matrix, the target material 
preferably does not react with water, 

3) after application to a matrix, the target material 
preferably does. not bind or degrade proteins in a non- 
30 . ... specific way, and 

4) the molecules of the target material must be.suffi- 
ciently large that attaching the material to a matrix 

, allows enough unaltered surface area (generally at 
. - least .500 A a , excluding the atom that is connected to 

35 the linker) for protein binding. 
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- . Affinity chromatography 'is' the preferred separation 
means, but FACS, electrophoresis, or other means may also be 
used. ; -' 

The present invention makes use of affinity separation 
5 of bacterial cells , or bacterial viruses (or other genetic 
packages) to enrich a population for those cells or viruses 
/ - carrying genes that code for proteins with desirable binding 

V properties. . ' ' ■ _■ 
V.C. Ta-raftt Mat erials 
10 ■ The present invention may be used to select f or binding 

domains which bind to one or more target materials , and/or 
fail. to bind to one or more target materials- Specificity, 
- of course, is the ability ' of a "binding molecule to bind 
strongly to a limited set "of target materials / while binding 
15' ~ moire weakly or not . at' all to another set of target materials 
from which the first set must be distinguished. ■ 

' The target materials may' be organic macromolecules, 
such as. polypeptides, lipids, polynucleic acids, and. 
polysaccharides, but are not so limited. The present 
20 invention is not, however, limited to any of the above - 
identified target materials . The only limitation is that 
the target material be suitable for affinity separation. 
Thus, almost any molecule that is stable in aqueous solvent 
* may be used as a target . .' \ 

25 ' • Serine proteases such as human neutrophil elastase 

(HNE) ' .are- an ■ especially interesting "class of potential 
target materials . Serine proteases are ubiquitous in living 
; . organisms- arid play vital roles in processes such as: 
digestion, blood clotting, fibrinolysis , immune response, 
30 fertilization, and post-translational processing of peptide 
hormones. Although the role these enzymes play is vital, 
uncontrolled or inappropriate proteolytic activity can be 
■ very damaging. 

v.D. ' Immobilization or Labeling of Target Material 
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For chromatography, FACS, or electrophoresis there may 

be a nee* to covalently link the target material to a second 

chemical entity. For chromatography the second entity is a 

matrix, for FACS the second entity is a fluorescent dye, and 

5 for electrophoresis the second entity is a strongly, charged 

• molecule. In many cases, , no coupling is required because 

the target material already has the desired property of : a) 

immobility, b) fluorescence, ore) charge . In other cases , 

chemical or physical coupling is required. 

It is not necessary that the actual target material be 

used in preparing the immobilized or labeled analogue that 
is to be used in affinity separation; rather, suitable 
reactive analogues of the target material may be more 
convenient. Target, materials that do not have reactive 
functional- groups may be immobilized by first creating a 
reactive functional group through the use of some powerful 
reagent, such as a halogen. In some cases, the reactive 
groups of the actual target material may occupy a part on 
the target molecule that is to be left undisturbed. In that 
20 ' case, additional functional groups may be introduced by 
synthetic chemistry. 

Two very general methods of immobilization are widely 
used. The first is to biotinylate the compound of interest 
and then bind the biotinylated derivative to immobilized 
avidin. - The second method is to generate antibodies to the 
target material, immobilize/ the antibodies by any of 
numerous methods, .. and then bind the target material to the 
immobilized antibodies. Use of • antibodies is more 
appropriate for larger target materials ; small targets 
(those comprising, for example, ten or fewer non-hydrogen 
atoms) may be so completely engulfed by an antibody that 
very little of the. target is exposed in the target -antibody 
complex. 

Non-covalent immobilization of hydrophobic -molecules 
35 without resort to antibodies may also be used. A compound, 
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such as 2 3, 3 - trimethyldecane. is : blended with- a matrix 
precursor, such as sodium alginate/ and the mixture is 
extruded , into a hardening solution/ The resulting beads 
will have 2,3 ,3- trimethyldecane dispersed throughout ( and 
5 . exposed on the surfaced ^ . ; 

. ; Other immobilization methods depend on the presence of 
; particular chemical functionalities. • ; A polypeptide will 
; present -NH 2 (N- terminal ; Lysines) ; - COOH " ( C - termiriar; 
; ; Aspartic Acids ; Glutamic Acids ) , -OH (Serines ; Threonines ; 
10 / Tyrosines) V and -SH (Cysteines) . For the reactivity of 
: amino, acid side chains/ see CREI84. A ^polysaccharide has 
\ free -OH groups, as does DNA, which has a sugar backbone. 
;vv'-.' .'Matrices suitable for use. as support materials include 
polystyrene, glass, ; agarose and other, chromatographic 
15 supports., and may be; fabricated into beads ,; ; sheets / columns, 
wells; and other forms as desired. 

.Early in the selection process, relatively high 
concentrations of target materials may ; be applied- to the . 
matrix to facilitate binding; target concentrations may 
20 subsequently be reduced to select for higher affinity SBDs . 
; • • v/E. Elution of Lower Affinity PBD- Bearing Genetic Packages 
/The population of GPs is applied to an affinity matrix 
under conditions compatible with the intended use of the 
binding protein "and the population is fractionated by 
25 • passage of a gradient of 'some solute over the column. The 
process; enriches for PBDs having affinity for the target and 
for which the affinity for the target is least affected by 
the eluants used. The enriched f ractions •••> are those 
containing viable GPs that elute from the column atv greater 
30 concentration of the eluant . v \ 'V - "" " ' 

The . eluants preferably are capable of weakening 
noncovalent interactions between ,: the displayed' PBDs and the 
immobilized target material . . Preferably, the eluants do not 
kill the genetic package; the genetic message corresponding 
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to successful mini -proteins is most conveniently amplified 
by reproducing the genetic package rather than by ifi vitro 
procedures such as PCR. .The list of potential eluants 
includes salts (including Na+, NH^, Rb+, S0 4 --, H 2 P0 4 - ; 
5 citrate, K+, Li+. Cs+, HS0 4 - , C0 3 --, Ca++, Sr++, C1-, P0 4 ---, 
HC0 3 -,-Mg++, Ba-M-, Br-, HP0 4 -- and acetate) , acid, heat; com- 
pounds .known to bind the target, and soluble target material 
(or analogues thereof) . 

The uneluted genetic packages contain DNA encoding 
10 binding domains which have a sufficiently high affinity for 
the target material to resist the elut ion conditions . • The 
DNA encoding such successful binding domains may be 
recovered in a variety of ways. Preferably, the bound 
genetic packages are simply elut ed by means of a change in 
15 the elution conditions. Alternatively, one may culture the 
genetic package An fiitli, or extract the target -containing - 
matrix with phenol (or other suitable solvent) and amplify 
the DNA by PCR or by recombinant DNA techniques. 
Additionally, if a site, for a specific protease has been 
20 engineered into the display vector, the specif ic protease is 
used to cleave the binding domain from the GP. v 

Nonspecific binding to the matrix, etc., may- be 
■ identified or reduced by techniques well known in the 
'affinity separation art. 
25 v , F - Rec^ v^rv of packages ; 

■ Recovery of : packages that display binding to an 
affinity, column may. be achieved in several ways , including : 

1) collect fractions eluted from the column with a 
gradient as described above; fractions elut ing- later 

30 in the gradient contain GPs more enriched for genes 

encoding PBDs with high affinity for the column, 

2) ' elute the column with, the target material in soluble 
• form, 
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3) flood the matrix with a nutritive medium and grow the 
desired packages in situ, . 

4) remove parts of the matrix and use them to inoculate 
growth medium, 

5 5) chemically' or enzymatically degrade the linkage 

. \ holding the target to the "matrix so that GPs still 
• : bound to target are eluted, or " 

6 ) degracfe the packages and recover DNA~with phenol" or 
other suitable solvent; the recovered DNA is used to 
10 transform cells that regenerate GPs . 

It is possible to utilize combinations of these methods . It 
should be remembered that what we want to recover from the 
affinity matrix is not the GPs per se, but the information 
' • ; in them. Recovery of viable GPs : is very strongly preferred, 
15 but recovery of genetic material is essential If cells, 
spores, or virions bind* irreversibly to the. matrix but are 
hot killed, we can recover the information through is fii£u 
cell division, germination, or infection respectively. 
Proteolytic degradation of the packages and recovery of DNA 
20 ... is not preferred. 

V.G. Amplifying the Enric hed Packages 

Viable GPs having the selected binding trait are 
amplified by culture in a suitable medium, or, in the case 
. of phage, infection into a host so- cultivated. . If -the GPs 
25 have been inactivated by the chromatography, the OCV 
carrying the osp - pbd gene are recovered, from the GP, and 
introduced into a new, viable host . 
V. H. Characteri zing the Putative SBDs :; 

- . For one or more clonal isolates, we may subclone the 
30 . sbd gene fragment , without the osp fragment, into an expres- 
sion vector such that each SBD can be produced as a free 
.protein. Physical measurements of ■ the strength of binding 
may be made for each free SBD protein by . any suitable 
method. • • ' • , ; / , v 
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If we find that the binding .is not yet sufficient, we 
decide which residues of the SBD (now a new PPBD) to vary 
next If the binding is sufficient, then we now have a 
egression vector bearing a gene encoding the desired novel 
5 binding protein. 

. *^Ttrt^e ~ of the^ethod 

scribed to select a. molecule that bind, to .ted A££ 
nb t to material^, or .that binds to both » and ,B. either 

10 alternatively or simultaneously. 
Y"iT «"°<i Mr "'° " f Antagonists 

It may be desirable to provide an antagonist to an 
enzyme or receptor. This .may be achieved by making a 
' molectle that prevents the natural substrate or agonist from 

the active site may be either agonists . or antagonists. Thus 
we adopt the .following strategy. We consider enzymes and 
receptors together under the designation TER (Target Enzyme 



20 OT "por most TERs. there exist chemical inhibitors that 
block the active site, usually, these chemicals are useful 
only as research tools due to highly toxicity. We make. two 



affinity- matrices: one with active TER and one 
• TER. we make a variegated population of GP<PBD>sand select 
is' tor SBPs that bind to. both forms of the enzyme . thereby 



obtaining SDPs that do not bind to the active site 
expect that SBDs will be found that bind different places on 
the enzyme surface . Pairs of the sfcfl genes are; fused with 
an intervening peptide segment . For example, if SBD-1 and 
30 SBD- 2 are binding domains that show high affinity for the 
' target enzyme and for which the binding is non- competitive, 
then the ' gene -flM-1 • -1 in.K-r ; ;B b d -2 encodes a two-domain 
protein that will show high affinity for the target. We 
Lke several fusions having a variety of SBDs and various 
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linkers. Such compounds have a reasonable probability of 
being an antagonist to the target enzyme; 

VI. EXPLOITATION OP SUCCESSFUL BINDING DOMAINS AND 
5 CORRESPONDING DNAS 

While the SBD may be produced .by -recombinant DNA 
techniques, an; advantage inhering from the use of v a, mini- 
pf oteiii 'as an IPBD is that it is likely that" the derived SBD 
will, also, behave like a mini -protein and will be obtainable 
10, by means of chemical synthesis. ^ (The term "chemical 
synthesis", as used herein, includes the use of enzymatic 
agents in a cell-free environment.) < 

. \ . It is also to be understood that mini -proteins, obtained 
by the method of the present invention, may be taken as lead 

15 compounds , for a. series of homologues that contain non- 
naturally occurring amino acids and groups other than amino 
acids. For example, one could synthesize a series of 
homologues in which each member of the series has, one amino 
acid replaced by its D enantiomer. One could also make 

20 homologues containing constituents such as alanine, 
aminobutyric acid, 3 -hydroxyproline , 2 - Aminbadipic acid, N- 
ethylasperagine , norvaline, etc. : these would be tested for 
binding and other properties of interest, such as stability 
and toxicity. • *. • ; . >- • - , \ ; 

25 Peptides may be chemically synthesized either in 

solution or on supports . Various combinations of stepwis4 
synthesis and fragment condensation may be employed. 

During synthesis , the amino acid side '/ chains are 
protected to prevent branching . Several different 

30, protective groups are useful for the protection of the thiol 
groups of cysteines : . , 

1) 4 rmethoxybenzyl (MBzl; Mob) (NISH82 ; ZAFA88) , removable 
with HF; • - . 
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■ 2) acetamidomethyl , (Acm) (NISH82; NISH86; BECK890 , 
removable with iodine; mercury ions iS^-> mercuric 
acetate) ; silver nitrate; and 
3) S-para-methoxybenzyl (H0UG84) . 

Other thiol protective groups may be found in standard 
reference works such as Greene, PROTECTIVE GROUPS. IN ORGANIC 

SYNTHESIS (1981) . 

Once the polypeptide chain has been synthesized, 
disulfide bonds must be formed. Possible oxidizing agents 
include air (HOUG84; NISH86) , ferricyanide (NISH82; H0UG84) , 
iodine (NISH82), and per formic acid (H0UG84) . Temperature, 
pH. solvent, and chaotropic chemicals may affect the course 
of the oxidation.: . * - 

A large number of micro -proteins with a plurality of 
disulfide bonds have been chemically synthesized in 
' biologically active form: conotoxin Gl (13AA, 4 Cys) (NISH- 
82) ; heat - stable enterotoxin ST (18AA, 6 Cys) (H0UG84) ; 
analogues of ST (BHAT86) ; C- conotoxin GVIA (27AA, 6Cys) (N- 
ISH86 ; RIVI87b); Q - conotoxin MVIIA (27 AA, 6 Cys) (0LIV87b) ; 
a- conotoxin SI (13 AA, 4. Cys) (ZAFA88) ; /i- conotoxin Ilia 
(22AA, 6 Cys) (BECKS 9c , CRUZ89 , HATA90) . Sometimes , the 
polypeptide naturally folds so that the correct disulfide 
bonds are formed. Other times, it must be helped along by 
use of a differently removable protective group for each 

25 pair of cysteines. 

The successful binding domains of the present invention 
may, alone or as part of a larger protein, be used for any 
purpose for which binding proteins are suited, including 
isolation or detection of target materials . in furtherance 
30 of this purpose, the novel binding proteins may be coupled 
directly or indirectly, covalently or noncovalently, to a 
label, carrier or support. 

When used as a pharmaceutical, the, novel binding 
proteins may be contained with suitable carriers or 
35 adjuvants. 
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EXAMPLE I 

DESIGN AND MUTAGENESIS OP A CLASS 1 MICRO -PROTEIN 

To obtain a library of binding domains that are 
cqnformatibnally constrained by a single disulfide, we 
insert DNA coding for the following family of micro -proteins 
into .the gene coding for a suitable OSP. 



X t - C - Xg ~ X 3 - X4 - Xj - C - Xfi 



Where ! ' indicates disulfide "bonding. Disulfides 

normally do not form between cysteines that are consecutive 

15 on the polypeptide chain. One : or more' of the. residues 
•'"indicated above : as X, will be varied extensively to obtain 
novel binding. There may be one or more amino acids that 
precede X, or follow X«, however, the residues before X, or 
after Xg will not be significantly constrained by the 

20 diagrammed disulfide bridge, and it is less advantageous to' 
vary these remote, unbridged residues. The last, X residue 
is connected to the OSP of the genetic package. 

Xt, X 2 , X 3 , X4, X 5 , and Xg can be varied independently; 
i.e. a different scheme of variegation could be used at each 

25 position. Xt and X« are the least constrained residues and 
may be varied less, than other positions . , . 

X, and Xfi can be, for example, one of the amino acids 
[E, K, T, and A] ; this set of -amino acids is . preferred 
because: a) the possibility of positively charged, : negative - 

30 ly charged, and neutral amino acids is provided, b) these 
amino acids can be . provided in 1:1:1:1 ratio yAa the codon 
RMG (R = equimolar A and G, M .= equimolar A and C) , and c) 
these amino acids allow proper processing by signal 
peptidases". 

35 in a preferred embodiment,. X2, X 3 / X 4 . and. X 5 a-*"© 

initially • variegated by encoding each by. the codon NNT, 
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which encodes the substitution set [F, S, Y, C, L, P, H, R, 

I, T, N, V, A, D, and G] . 

The advantages of the NNT over the NNK codon become 
increasingly apparent as the number of variegated codons 
increased. Tables 10 and 130 compare libraries in which six. 
codons have been varied either by NNT or NNK codons. NNT, 
encodes 15 different amino acids and only 16 DNA 'sequences . 
Thus, there are 1.139 • 10 7 amino -acid sequences, ho stops, 
and only 1.678 • 1° 7 DNA sequences. A library of 10* 
independent trans formants will contain 99% of all possible 
sequences. The NNK library contains 6.4 • 10 7 sequences, 
but complete sampling requires a much larger number of 
independent transf ormants . 

This sequence can be displayed as a. fusion to the gene 
15 - III. protein of M13 using the native M13 gene. Ill promoter 
and signal sequence. The sequence of M13 gene III protein, 
from residue 16 to 23 , is S^SAETVEm; signal peptidase -I 
cleaves after S 18 . We replace this segment with 
S 16 GA W AEGX 1 CX 2 X 3 X 4 X 3 CX 6 SYIEGRVIETVE. 
20 Note that changing H 17 S W to GA does not impare the phage for 
J infectivity. It is useful to insert a bovine F.Xa 
recognition/ cleavage site (YIEGR/VI) between the PBD and the 
mature III protein; this not only allows orientational 
freedom for the PBD, but also allows cleavage of the PBD 

25 from the GP. * 

A phage library in which X,, Xj, X 3 , and X* are encoded 
by NNT (allowing F, S, Y, C, L, P, H, R, V, T, N, V, A, D, 
& G) and in which X 3 and X4 are. encoded by NNG (allowing L, 
S, W, P, Q, R, M, T, K, V, A, E, and G) is named TN2 . This 
library displays about 8 . 55 x 10 6 micro -proteins encoded by 
about 1.5 x 10 7 DNA sequences. NNG is used at the third and 
fourth variable positions (the central positions, of the 
disulfide- closed loop) at least in part to avoid the 
possibility of cysteines at these positions . 
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Devlin, et al., screened 10 7 • transformants, each of 
which could display one of 10 12 random pentadecapeptides , for 
affinity, with streptavidin, and found 20 • streptavidin- 
binding phage isolates, with eight unique sequences ( "A" - 
5 ..All contained HP; 15/20, HPQ; and 6/20, HPQF, though 

-in. different positions within the pentadecapeptide . The 
. most frequently encountered isolates were P (5), r I (4) , and 
A(3) , which entirely, lacked cysteines:'^ 
positive isolates, "E" (l) and «F "(2), 'included a pair of 
10 cysteines positioned so that formation of a disulfide bond 
was possible. The sequences of these isolates is given, in 
Table 820. 

y .we recognized that our TN2 library should include a 
J , putative micro -protein, HPQ, similar enough to Devlin's "E" 
15 and "F" peptides to have the potential of exhibiting 
streptavidin.binding activity: _ HPQ comprises the AEG amino 
terminal sequence common to all members of the TN2. library, 
followed by the sequence PCHPQFCQ which has the potential 
for forming a disulfide bridge with a span of four, .followed 
by a serine (S) and a bovine factor Xa, recognition site 



20 



(YIEGR/IV) (see 
the binding of 



Table 820 ) . Pilot experiments . showed that 
HPQ -bearing phage, to streptavidin was 
comparable to - that of' Devlin's "F" isolate; both were 
marginally above background (1 . 7x) . We therefore screened 
25 our TN2 library against immobilized streptavidin . 

- Streptavidin is available as free protein (Pierce) with 
■ - > a specific activity of . 14 . 6 units per mg ( 1. unit will bind 
" l p.g of bibtin) . A stock solution of l mg per ml in PBS 
containing 0.01% azide is made. 100/tL of StrAv stock is 
30 added to each 250 fiL capacity well of Immulon (#4) plates 
arid, incubated overnight at. 4°C. The stock is removed and 
replaced with 250 piL of. PBS " containing BSA at a 
concentration of 1 mg/mL and left : at 4°C for a further l 
hour. Prior to use in a phage binding assay the wells are 
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Washed rapidly 5 times with 250 /xL of PBS containing 0 . 1% 

Tween. 

To each StrAv- coated well is added 100 yl> of binding 
buffer (PBS with l . mg per mL BSA) containing a known 
5 quantity of phage <10» pfu's of the TN2 library). 
Incubation proceeds for 1 hr at room temperature followed by 
removal of the non- bound phage and 10 rapid washes with PBS 
0.1% Tween, then further washed with citrate buffers of pH 
7, 6 and 5 to remove non-specific binding . The bound phage 

10 are eluted with 250 »L of pH2 citrate buffer containing 1 mg 
per mL BSA and neutralization with 60 /iL of 1M tris pH 8. 
The eluate was used to infect bacterial cells which 
generated a new phage stock to be used for a further round 
of binding, washing and elution. The enhancement cycles 

15 were repeated two more times . (three in total) after which 
time a number of individual phage were sequenced and tested 
as clonal : isolates. The number of phage present in each 
step is determined as plaque forming units (pfu's) following 
"appropriate dilutions and plating in a lawn of F' containing 

20 E. COll. 

Table 838 shows the peptide sequences found to bind to 
StrAv and' their frequency in the random picks taken from the 
final (round 3) phage pool. 

The intercysteine segment of all of the putative micro- 
25 proteins examined contained the HPQF motif. The variable 
residue before the first cysteine could have contained any 
of {F l S,Y,C,h,T? t H,R,T,T l N l V,k,D,G}; tte residues selected 
. were {Y,H,L,D,N} while phage HPQ has P. The variable 
residue after the second cysteine also could have had 
30 {F,S,Y,C,L,P,H,R, I,T,N,V,A,D,G} ; the" residues selected were 
{p,S,G,R,V} while phage HPQ has Q. The relatively poor 
binding of phage HPQ could be due to P 4 or to Q, 3 or both. 
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, • In a control experiment, the TN2 library, was screened 
in an identical manner .to that shown above but with the 
" target . protein being the blocking agent -BSA. ' . Following 
three rounds of binding, elution, and amplification, sixteen 
5 random phage. plaques were picked and sequenced. Half of the 
... clones demonstrated a lack of insert (8/16) , the .other half 
. .. had the sequences shown in Table 839 . . - There is no consensus 
for this collection. "' - ■ 1 • • ; 

- we have displayed a related micro -protein, HPQ6, on 
10 phage. It is identical to HPQ except for the replacement ' of 
CHPQFC with CHPQFPRC . (see Table 820) . When .displayed, HPQ6 
had. a substantially stronger affinity for s treptayidin ' than 
either HPQ or Devlin's F isolate. : (Devlin's; "E" isolate 
' ' was 'not studied.) Treatment with dithiothreitol (DTT) 
15 . markedly reduced the binding . of. HPQ6 phage , (but not control 
; phage) to streptavidin, ' suggesting that the presence of a 
disulfide bridge within the displayed peptide was required 
for good binding. In view of. the results of the screening 
of the TN2 library, it is likely: that the binding of phage 
HPQ 6 could be further improved by changing; P 4 to one of 
{Y,H,L,D,N} and/ or changing Qh to one of {P,S,G,R,V}. 
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;\ EXAMPLE II 

A CYSs :HELIX: sTORNs : STRAND: :CYS UNIT 

25 The parental Class 2 micro -protein may be a natural ly- 

; : occurring class 2 micro-protein . It may also.. be a : . domain of 
a larger protein whose structure , satisfies or may be 
modified so as to satisfy the criteria of a class 2 micro- 
.protein. The modification may be a simple one, such as the 

30 ihtrpductiori of a cysteine (or a pair of cysteines) into the 
base of a hairpin . structure so that the hairpin may be 
closed off with a disulfide bond, or a more elaborate one, 
so as the modification of intermediate residues so as to 
achieve the hairpin structure. The parental class 2 micro- 
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protein may also, be a composite of structures from two or 
more naturally- occurring proteins, SjjSb., an a helix of one 
protein and a 0 strand of a second protein. .. 

One micro-protein motif of potential use comprises a 
5 disulfide loop enclosing, a" helix, a turn, and a return 
strand. Such a structure could be designed or it couid.be 
obtained from- a protein of known 3D structure. Scorpion 
neurotoxin, variant . 3, . (ALMA83a, ALMA83b) (hereafter 
ScorpTx) contains a structure diagrammed in Figure i that 
10 comprises a helix (residues . N22 through N33 ) , a; turn 
(residues 33 through 3.5), and a return strand (residues 36 
through 41) . ScorpTx contains disulfides that join residues 
12-65, 16-41,. 25-46, and 29-48. CYS„ and CYS 4I are quite 
close and could be joined by a disulfide without deranging 
15 the main chain. . Figure l shows CYS M joined to CYS 41 . In 
addition, CYS„ has been changed to GLN. It is expected that 
a disulfide will form between 25 and 41 and that the helix 
shown will form; we: know that the amino -acid sequence shown 
is highly compatible with this structure. The presence jpf 
20 GLY35 , GLY*, and GLYj, give the turn and extended strand 
sufficient flexibility to accommodate any changes needed* 
around CYS41 to form the disulfide. ' 

From examination of ; this structure (as found in entry 
1SN3 of the Brookhaven Protein Data Bank) , we see that the 
25 following sets of residues would be preferred . for variega- 
tion: "'; .• • ■"• .... 



WO 92/15677 



PCT/US92/01456 



86 

set i '. '■ : ' ■ !; '' '■■ 

Residue Codon Allowed amino acids Naa/Ndna 



1) 


T27 ■ • 


NNG 


It 2 R 2 MVS PTAQKEWG . *' ' 


: 13 /15' 


2) 


E 2 g 


VHG . 


. " LMVPTAGKE ' 


• 9/9 


3) 




VHG .... 


LMVPTAGKE 


• 9/9- 


4) 


K 32 


VHG . 


LMVPTAGKE 


9/9 


..5). 




NNG. 


L 2 R?MVSPTAQKEWG . 


. : 13/15. 


6) 


E23 


VHG 


LMVPTAGKE 


9/9 


7) 


Q34 


VAS ■ 


HQNKED \ 


6/6 



Note:' Exponents on .amino . acids indicate multiplicity of 
codons. ' : - < ' . . ' " 

Positions 27 r 28, 31, 32, 24, and 23 comprise' one face 
.. of the helix. At each of these locations we have picked a 
15 variegating codon that a) includes the parental amino acid, 
b) includes a set of residues having a predominance of helix 
favoring residues, c ) provides for a.' wide variety of amino 
acids > and d) leads to as even a distribution as possible. 
Position 34 is part of a turn. The side group of residue 34 
20 could interact with molecules that contact the side groups 
of resideus 27, 28, 31, 32, 24, and 23. Thus we allow 
variegation here and provide amino acids that are compatible 
with . turns . The variegation shown leads to 6 65 10 6 amino, 
. acid sequences encoded by 8 .85 • 10 6 DNA sequences . 



25 


SET 2 
• Residue 


Codon 


Allowed amino acids ' 


Naa/Ndna 




1) D M 


VHS 


L 2 IMV 2 P 2 T 2 A 2 HQNKDE 


13/18 




2) T„ . • 


NNG 


L 2 R 2 MVS PTAQKEWG . 


13/15 




3) k 30 : 


VHG" 


KEQPTALMV [ 


9/9 


30 


.: 4) . A3, • 


VHG 


KEQPTALMV 


9/9 




5). K 32 


VHG. 


LMVPTAGKE 


9/9 




■6) S 37 . 


RRT 


SNDG • 


4/4 




7) Y 38 


NHT 


YSFHPLNTIDAV 


9/9 
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Positions 26, 27, 30, 31, and 32 are variegated so a 
to enhance helix- favoring amino acids in the population. 
Residues 37 and 38 are in the return strand so that we pick 
different variegation codons. This variegation allows 
5 4.43 -10 6 amino-acid sequences and 7.08-10 6 DNA sequences. 
Thus a library that embodies this scheme can be sampled very 
efficiently. 

EXAMPLE XII 

10 DESIGN AND MUTAGENESIS OF CLASS 3 MICRO- PROTEIN 

Two Diaulf ide Bond Parent al Micro- Proteins 

Micro -proteins with two disulfide bonds may be modelled 
after, the a-conotoxins, e.g. . GI, GIA, Gil, MI, and SI. 
, These, have the following conserved structure: 



15. 



20 



12 1' 2' 

(1-2 AAS) - C-C- (3 AAS ) - C - (5 AAs) -C- (0-5 AAs) 



+ 



. Hashimoto £t al^ (HASH85) reported synthesis of twenty - 
four analogues of a conotoxins GI, Gil, and MI. Using the 
numbering scheme for GI (CYS at positions 2, 3, 7, and 13), 
25 Hashimoto s£ aL. reported alterations at 4, 8, 10, i:and 12 
that allows the proteins to be toxic. Almquist st aJL«_ 
(ALMQ89) synthesized tdes-GLU,] of Condtoxin GI and- twenty 
analogues. They found that substituting GLY for- PRO, gave 
. rise to two isomers, perhaps related to different disulfide 
30 bonding. They found a number of substitutions at residues 

8 through 11 that allowed the protein to be toxic. Zafar- 
alla et al- (ZAFA88) found that substituting PRO at position 

9 gives ah active protein. Each of the groups cited used 
only is vivo toxicity as an assay for the activity. From 

35 such studies, one can infer that an active protein, has the 
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parental 3D structure, but one can not infer that an 
inactive protein lacks the parental 3D structure, 

. Pardi £t al . (PARD89 ) . determined the 3D structure of a 
/Conotoxin GI obtained f rom'yenom by NMR. Kobayashi eJt al . 
5 (KOBA89) have reported a 3D structure of synthetic a 
Conotoxin GI from NMR data which agrees with that of PARD89. • 
Wf_refer ^ Figure 5 . of ?ardi gtL&- ' ^ 1 1 1 ■ 

. Residue GLUj is known to accomodate GLU, ARG, and ILE 
in known analogues or . homologues . A pref erred 'variegation 
10 codon iis NNG that allows the set of amino acids [L 2 R 2 MVSPTA-v 
QKEWG<stop>] . From Figure 5 of Pardi jgt al , we see that the 
side group of ; GLUx projects into the same region as the 
strand comprising- residues 9 through 12 • Residues 2 and 3 
are cysteines arid are not to be varied. The side group of 
15 residue 4 points away from residues 9 through 12; thus we 
defer varying; this residue . until a later round. PR0 5 may be 
needed to cause the correct disulfides to form; when GLY was 
substituted here the peptide folded into two forms, neither 
of which is toxic. It is allowed to vary PR0 5 , but not 
20 per f erred ;in the first round. . ' " 

No* substitutions at ALA^ have been reported: A 
preferred variegation codon is RMG which gives rise to ALA, 
THR, LYS , and GLU (small hydrophobic, small hydrophilic, 
positive, and. negative) . GYS 7 is not varied. We prefer to 
25 leave GLYg as is, although a homologous protein having ALAg 
: is ; toxic ; Homologous proteins haying various amino acids at 
position, 9 are toxic; thus, we use an NNT variegation codon 
, which allows FS 2 YCLPHRITNVADG We use NNT at positions 10 , 
■ ... 11 , . and 12 . as well . At position 14, following the fourth 
30 CYS, , we . al low. , ALA , THR , LYS , or GLU ( via an 'RMG codon) 1 
This variegation allows 1.053-10 7 anino-acid sequences, 
encoded by 1. 68 * 10 7 DNA sequences. Libraries having 2.0 -10 7 , 
; 3.0*10 7 / and 5-0-10? /independent transf ormants will, 
respectively, display -70%, -83%, and -95% of the allowed 
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sequences . Other variegations are also appropriate . 
Concerning a conotoxins, see, intSX alia, ALMQ89, CRUZ85, 
GRAY83 , GRAY84, and PARD89 . 

The parental micro-protein may instead be one of the 
proteins designated ."Hybrid- 1" and "Hybrid- II" by Pease fit 
al .' (PEAS90) ; cjL*. Figure 4 of PEAS90. One preferred set of 
residues to vary for either protein consists of: 

Parental Variegated Allowed AA seqs/ 

Amino aci rtn DNA BeOS 



amino acid 
10 . A5 
P6 
E7 
T8 

A9. . 
15 , A10 , . •' 
K12 ..." 
Q16 



Codon 
RVT 
VYT 
RRS 
VHG 
VHG 
RMG 
VHG 
NNG 



ADGTNS 6/6 

' PTALIV 6/6 

EDNKSRG 1 ; 7/8 

TPALMVQKE 9/9. 

ATPLMVQKE - . .9/9 

AEKT ' - ' - ' 4/4 

KQETPALMV- ; 9/9 
' L* R* S . WPQMrKVAEG 13/15 



This provides 9.55* 10 6 ' amirio-acid sequences encoded by 
20 1.26-10' DNA sequences. A library comprising 5.0-10 7 
transformants allows expression of 98.2% of all possible 
sequences. At each position, the parental amino acid is 
allowed . 

At, position 5 we provide amino acids that are compati- 
25 ,. ble with a turn. At position 6 we allow ILE and VAL'; because 
they have branched 0 carbons and make the chain ridged. At: 
position 7 we allow ASP, ASN," and SER 'that often appear at 
the amino termini of helices. At positions 8 and 9 we. allow 
several helix- favoring amino acids (ALA, LEU, MET, GLN, GLU, 
30 and LYS) that have differing charges and hydrophobicities 
because these are part of the helix proper. Position 10 is 
further around the edge of the helix, so we allow a smaller 
set (ALA, THR, LYS, and GLU) . This set not only includes 3 
helix- favoring amino acids plus THR that is well tolerated 
35 but also allows positive, negative, and neutral hydrophilic. 
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The side groups of 12 and 16 project Into the same region as 
the residues already recited. At these positions we allow 
a wide variety of amino acids with a bias toward helix- 
favoring amino acids. 
5 The parental nd cro- protein inay instead be a polypeptide 

composed of residues 9-24 and 31-40 of: aprotinin and 
possessing : two. .disulfides (Cys9-_Cys22 L ^d_Gysi4-cys3_8) ._ 
Such a polypeptide would have .' the , same disulfide bond 
topology as of-conotoxin, and its two bridges would have 

10 spans of 12 and 17, respectively. 

Residues 23, 24 and 31 are variegated to encode the 
amino acid residue set [G,S,R,D,N,H, P,T, A] so that a 
sequence that favors a turn of the necessary geometry is 
found. We use trypsin or anhydrotrypsin as the affinity 

15 molucule to enrich for GPs that display a micro -protein that 
folds into a stable, structure similar to BPTI in the PI 
region-. ..... 

T hr** Disulfide B ond Parental Micro- Proteins 

The . cone snails (CoBUS) produce venoms (conotoxins) 

20 which are 10-30 amino, acids in length and exceptionally rich 
in' disulfide bonds . They are therefore archetypal micro- 
proteins. Novel mi cro -proteins with three disulfide bonds 
may . be modelled after, the p-(GIXlA, GIIIB, GIIIC) or 
0- (GVIA, GVIB, GVIC, GVIIA, GVIIB, MVIIA, MVIIB, e£c^.) 

25 conotoxins . The p.- conotoxins have the following conserved 
structure: . • ... • "'' 



30 



1 2 3 1' ; - 2 '3' 

(2 AAs) -C-C- (5. AAs) -C- (4 AAs) -C- (4 AAs) -C-C-AA 

, - L 



35 



No 3D structure of a /x-conotoxiri has been published. 
Hidaka fit al . (HIDA90) have established the connectivity of 
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the disulfides.. , . The following diagram depicts geographu- 
toxin I (also known, as fi-conotoxin GIIIA) . > 



Rl 



10 



15 



20 



D2 



\ /K16— P17 

C3::C15 \ 

\ . Q18 

\ -R19 1 

C4: :C20- \ 



T5 



/ 



/ 

P6 5 



\ 



Q14 

/ . - I 

P7 CIO: :C21 R13 

" ■ 7 L A22 - | 

/, I - . A 

K8-K9 Kll ' D 12- . • 



25. 



30 



35 



The connection from. R19 to C20 could go over or under the 
strand from Q14 to C15. One preferred form of variegation 
is to vary the residues in one loop. Because the longest 
loop contains only five amino acids, it is appropriate to 
also vary the residues connected to the cysteines that form, 
the loop. For example, we might vary residues 5 through 9 
plus 2, li, 19, and 22. Another useful variegation would be 
to vary residues 11-14 and 16-19, each .through eight amino 
acids. Concerning fi conotoxins, see. BECK89b ,. BECK89C, 
CRUZ 8 9 , and HIDA90. — . , 

The 0- conotoxins may be represented as. follows:. 

! 2 3 1' 2' 3' 

C- (-6 AAS) -C- (6 AAS) -C-Cr (2-3 AAs) -C- (4-6 AAs) -C 
I -1 1 — 1 



40 



The King Kong peptide has the same disulfide arrangement as 
the Q- conotoxins but a different biological activity. 
Woodward e£ al^ (WOOD90) report the sequences of three 
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homologuous proteins from q. textile . Within the mature 
toxin domain, only the cysteines are conserved. The spacing 
of the cysteines is exactly conserved, but no other position 
has the same amino acid in all three sequences and only a 
5 few positions show even pair- wise matches . Thus we conclude 
that all positions . (except the cysteines) may be substituted 
freely -with a high probability that abatable disulfide 
structure will form. Concerning Q conotoxins / see HILL89 
and SUNX87. 1 ; 

10 - Another micro-protein which may be used ais a parental 

binding domain - is t:he pucurbita maxima trypsin inhibitor I 
(CMTI-I) ; CMTI-III is also appropriate. They are members of 
the squash family of serine protease inhibitors, which also 
includes inhibitors from summer squash, zucchini, and 

15 cucumbers (WIEC85) . McWherter et al. (MCWH89) describe 
synthetic sequence -variants of the squash- seed protease 
inhibitors that have 'affinity for human leukocyte elastase 
and cat heps in G. Of course, any member of this family might 
be used. 

20 CMTIrl is one of the smallest proteins known, compris- 

ing only 29 amino acids held in a fixed comf ormation by 
three disulfide bonds. The structure has been studied by 
.. Bode and colleagues using both X-ray dif fraction (BODE89) 
and NMR (HOIiA89a,b) . CMTI-I is of ellipsoidal shape; it 
25 lacks helices or 0- sheets, but consists of turns and 
connecting short polypeptide stretches. The disulfide 
pairing is Cys3-Cys20, Cysl0-Cys22 and Cysl6 - Cys2 8 . In the 
CMTI-I: trypsin complex studied by Bode et al. . 13 of the 29 
inhibitor. residues are in direct contact with trypsin; most 
30 of them are in the primary binding segment Val2 (P4) -Glu9 
(P4 1 ) which contains the reactive site bond Arg5 (PI) -Ile6 
and is in a conformation observed also for other serine 
proteinase inhibitors. 
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CMTI-X has a Kj for trypsin of -1.5 -1CT 12 M. McWherter 
et al . suggested substitution of "moderately bulky hydropho- 
bic groups", at PI to confer HLE specificity. They found 
that a wider set of residues (VAL, ILE, LEU, ALA, PHE, MET-, 
5 and GLY) gave detectable binding to HLE. For cathepsin G, 
they expected bulky (especially aromatic) side groups to be 
strongly preferred. They found that PHE , LEU, MET, and ALA 
were functional by their criteria; they did not test TRP, 
TYR, or HIS. (Note that ALA has the second smallest side 
10 group available.) 

A preferred initial variegation strategy would be to 
vary some or all of the residues ARG,, VAI^, PRO4, ARGj, ILE 6 , 
- LEU 7 , MET,, GLU 9 , LYS„, HISj,, GLY 26 , TYRj,, and GLY^. If the 
target, were . HNE , f.or example, one . could synthesize DNA 
15... embodying the following possibilities: 

vg Allowed . - #AA seqs/ 

Parental Codon amino acids #PWA Begs 

ARG/ VNT RSLPHITNVADG 12/12 . 

NWT VILFYHND 8/8 



20 



25 



VALj 

PR0 4 VYT PLTIAV -6/6 

ARGj 
ILE 6 
LEU 7 



VNT RSLPHITNVADG 12/12 . 

NNK all 20 ' 20/3-1 v 

VWG. LQMKVE 6/6 



TYR27 ■ NAS YHQNKDE. 7/8 



This allows about 5.81-10 6 amino -acid sequences encoded by 
about i.03-10 7 DNA sequences. A library comprising 5.0- 10 7 
independent transf ormants would give -99% of the. possible 
sequences. Other variegation schemes could also be used. 

30 Other inhibitors of this family include: 

Trypsin inhibitor I from Citrullus vulgaris (OTLE87) , 
' Trypsin inhibitor II from Bryonia dJLfiisa (OTLE87) , 

Trypsin inhibitor I from Cucurbita maxima (in OTLE87) , 
trypsin inhibitor III from Cucurbita maxima (in OTLE87) , 

35 trypsin inhibitor IV from Cucurbita maxima (in OTLE87) , 
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.trypsin inhibitor II from Cucurbita peoo (in 0TLE87) ,. 
' trypsin /inhibitor III from Cucurbita pepo (ih OTLE87). , 
trypqin inhibitor lib from Cucumis sativus (in OTLE87) , 
trypsin inhibitor IV from Cucumis sativus ( in. 0TLE87) , 
5 • trypsin inhibitor II from Ecballium elaterium (FAVE89 );, and 
- TnhibitiQr CM-i from Momordica repens (in 0TLE87) . 
\ r Another micro -protein that may be used as an initial . 

. potential binding domain is the heat -stable enterotpxins 
derived from some / eriterotoxogenic 1^.; coli. Citrobacter 
10 f reundii . and other bacteria (GUAR89) . These micro -proteins 
are known to be secreted from cpli and are extremely 
stable* Works related to synthesis, cloning, expression and 
properties of : these proteins , include : . BHAT86, SEKI85, 
- ^ S HIM8 7 TAKA8 5 , .-\." TAKE 90 THOM8 5 a ,;, b , \ Y0SH85, DALL90 , DWAR89, 
15 ■ GARI87, GUZM89 , GUZM9 0 , HOUG8 4 , K0BO89 , KUPE90, OKAM87, . 
, OKAM88, arid OKAM90. ; • /'" '"' 

EXAMPLE IV 

~ A MINI -PROTEIN HAVING A CROSS -LINK CONSISTING OF CtT(II) , ONE 
20 CYSTEINE , TWO HISTIDINES, AND ONE METHIONINE. 

Sequences such as ' ''■/ ' 

HIS-ASN-GLY-MET-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa- and 
C YS - ASN- GLY - MET - Xaa - Xaa - Xaa - Xaa - Xaa - Xaa - HIS - ASN - GLY - HISar e 
likely to combine with Cu(li) to form structures as shown in 
25. the diagram: . 
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10 



15 



Xaa7- 

/ 

Xaa6 



-Xaa8 

Y 

Xaa9 
I 



Xaa5 XaalO 

\ / : 

MET4 HIS11 

/ V /■ V 

/ \ / \ 

GLY3 Cu ASN12 

I / \ I 

ASN2-HIS1 CYS 1 4— GLY1 3 

. _ I ■ I V 
NH 2 COO 



Xaa7- 

/ 

Xaa6 
I 

Xaa5 
\ 

MET4 

/ V / 

/ . \ / 

GLY3 Cu 
I / \ 



-Xaa8 
\ 

Xaa9 
I 



XaalO 
/ 

HIS11 

\ . 
\ 

ASN12 
I 



ASN2-CYS1 HIS14-GLY13 



I 

COO 



Other arrangements of HIS , MET, < HIS , and CYS . along the chain 
are also likely to form similar structures. The amino acids 

20 ASN-'GLY at positions 2 and 3 and at positions 12 and 13 give 
the amino acicis that carry the metals binding ligands enough 
flexibility for them to come together and bind the metal. 
Other connecting sequences may be used, e t g f . GLY-ASN, SER- 
GLY, GLY-PRO, GLY-PRO-GLY, or PRO -GLY-ASN could be used. It 

25 is also possible to vary one or more residues in the loops 
that join the first and second or the. third and fourth 
metal -binding residues. For example, 



30 



35 



40 



Xaa8- 

/,. 
Xaa7 

I 

Xaa6 
\ 

I ^MET5 

Xaa4 \ 

I V / 

PR03 Cu 
\ / \ 



-Xaa9 

,\ 

XaalO 
I 

Xaall 
/ 

HIS12 

/ \. 



\ 

ASN13 
I 



GLY2-HIS1 CYS15— GLY14 



NHj COO 
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is likely to form the diagrammed structure for a wide 
variety of amino acids at Xaa4. It is expected that the 
side groups of Xaa4 and XaaS will be close together and on 
the surface of the mini -protein. 
5 The variable amino acids are held so that they liave 

, limited . flexibility . This cross -1 inkage has some differ- 
ences-from the disulfide linkage . j ^5;' se ?^tio?rbetween 
and C all is greater than the separation of the C 0 s of a 
. cystine. In addition," the interaction of residues l through 
10 ■ 4 and 11 through 14 with the metal ion are expected to limit 
: the motion of residues 5 through 10 more than a disulf ide 
; . between rsidues, 4 and 11. A single, disulfide, bond exerts 
strong distance .constrains on the a carbons of the joined 
residues, but very little directional constraint on, for 
15 : example, the vector from N. to C in the main- chain. 

... 'For. the desired sequence, : the' side groups of residues 
5 through 10 can form specific interactions with the target. 
Other numbers of variable amino acids, for example, 4, 5, 7, 
or 3, are appropriate." Larger spans may be used when the 
20 enclosed sequence contains segments having a high potential 
to form a helices or other secondary structure that limits 
the conformational freedom of the polypeptide main chain. 
Whereas a mini -protein having four. CYSs. could form three 
distinct pairings, a ; mini -protein having : two HISs, one MKT, 
25 and one CYS can form only two distinct complexes with Cu, 
These two structures are related by mirror symmetry through 
the Cu. Because the two HISs are distinguishable,, the 

structures are different. * _ . . ; ; 

When such metal- containing mini-proteins are displayed 

30 on filamentous phage, the cells that produce the phage can 

be grown in the presence of the appropriate metal ion, or 

the phage can be exposed to the metal only after they are 

separated from the cells. 
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EXAMPLE V 

A MINI -■ PROTEIN HAVING A CROSS-LINK CONSISTING OF ZN(XI) AND 

FOUR CYSTEINES 

A cross link similar to the one shown in Example XV is 
5 exemplified by the Zinc -finger proteins (GIBS88, GAUSS 7 , 
PARR8 8 , FRAN87 , CHOW87, HARD90) . One family of Zinc-fingers 
has two CYS and two HIS residues in conserved positions that 
•bind Zn ++ ( PARR8 8 , FRAN87, CHOW87, EVAN8 8 , BERG8 8 / CHAV8 8 ) . 
Gibson s£. ai* (GIBS88) review a number of sequences thought 
10 to form zinc-fingers and propose a three-dimensional model 
for these compounds. Most. of these sequences have, two CYS 
and two HIS residues, in conserved positions, but some have 
three CYS; and one HIS residue. Gauss ££. al, (GAUS87) also 
• ■ ; report a zinc- finger protein having three CYS and one HIS 
15 _ residues .that bind zinc. Hard ejk ai^ ' ( HARD9 0 ) report the 3D 
. structure v of a protein, that comprises two zinc- fingers, each 
. of which, has four CYS residues ? . All of these zinc-binding 
proteins are stable in the reducing intracellular environ- 
. • ment .. . • ; . 

20 " -' One preferred example of a CYS: : zinc cross linked mini - 

protein comprises residues 440 to 461 of the sequence shown 
in Figure 1 of HARD90. The resiudes 444 through; 456 .may ' be 
variegated. One such variegation is as follows: 
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pareiiLai 


Allowed 










#AA 


/ 


#DNA 


AAA 

SER444 


SER, 


ALA 










2 




2 




ASP, 


ASN, 




LYS 






4 


/ 


' 4 V ••' 


r'T.TTA A £ 


GLU, 


LYS, 


GLN 








3 


/ 


3 


jiLi/i*!^ / ■ 


ALA, 


THR, 


GLY/ 


SER 






; 4 


/ 


. 4 


CT7D AAQ r ■ 


SER, 


ALA 










• •; 2 • 


/ 


2 . .. 


GLY449 L 


GLY,, 


SER, 










4 


/ 


4,"/. 


CYS450 


CYS/ 


PHE, 




T.T7TT 






4 


/ 


. 4 . 




HIS, 


GLN, . 




LYS 


ASP, 


GLU 


- 6 


: / 


6 


TYR452 


TYR, 


PHE, 


HIS , . 


LEU 






4 


/ 


4 


GLY453 . 


GLY, 


SER, 


ASN, 


ASP 






4 


/ 


4 


VAL454. 


VAL, 


ALA, 


ASP, 


GLY, 


SER, 


ASN, 


THR, ILE 
















~ 8 


/ 


• 8 ' 


LEU455 


-LEU, 


HIS', 


ASP, 


VAL 






; ' " '4 


/ 


' 4 •' 


THR456 


THR, 


ILE. 


ASN. 


SER 






v.4 


/ 


4 . 



This leads to 3 .77 - 10 7 DNA sequences^ that encode the same 
number of amino - acid ; sequences . A 1 ibrary having 1-0 -10 s 
independent transf ormants ; will display 93% of the; allowed 
20 sequences; 2.0 • 10* independent transf ormants will display 
99.5% of allowed sequences* : 
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Table 2: Preferred Outer- Surface Proteins 



Genetic 
Package 



Preferred 
Outer- Surf ace 
prqtein _ 



Reason for 



preference 
M13 



coat protein 



10 



a) exposed amino terminus, 
(gpVIII)b) predictable; post- 

translational 

processing, 
c) numerous copies in 

virion. 

dV fusio n data available — 



gp in 



15 



a) fusion data available. 

b) amino terminus exposed, 

c) working example 
available. — : 



PhiX174 



G protein 



20 



a) known to be on virion 

exterior/ ' " 

b) small enough that 

the G-iobd gene can 
repla ce H gene -. 



25 



E . coli 



LamB 



a) fusion data available, 
bl non-eflgential, — - 



OmpC 



30 



a) topological model 

b) non-essential; abundant 

OmpAa) topological model 

b) non-essential ; abundant 

c) homologues in other genera 



WO 92/15677 ^ W PCT/US92/01456 

; " ,100 , 

a) topological model 

b) non- essential; abundant 

• • ; PhoEa) topological model 

. b) non-essential; abundant 

c) inducible 

>, aubtiliB ■ CotC ' a) no post- trans lational 

1Q y"\ ' spores processing/ 

b) distinctive sdequence 
^ that causes protein to 

localize in spore coat, 
■ »• ; c) non-e'ssmfial / — _ — — 



15 



HotD flame as forCotC, 
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Table 10: Abundances obtained 
from various vgCodons 



A. Optimized fxS Codon, Restrained by [D] + [B] 



[K] + [R] 



1 
2 
3 



Amino 

acid 



■ A 
D 
F 

' -' H- 
K 
M 
p: 

R 
T 
VL 

StPP 



T _ 


c 


A 


.26 


.18 


.26 


.22 • 


.16 


.40 


.5 


• .0 


.0 



|^h v ndance 



.30 
.22 
.5 

Amino 
acid 



4.80% 
6.00% 
2 . 86% 
.3.60% 
5.20% 
2 . 86% 
2.88%. 
6.82% 

4 .16% ' - 
?l Bfi» lfaa 



C 

E 

G 

I 

L 

N 

Q 

S_ 

V 

Y 



5-20% 



f 
X 

S 



2.86% 
6 .00% 
6.60% 
2.86% 
6.82% 
5.20% 
3.60% 

7,Q?.% mfaa 
6.60% 
■ 5.20% 



[D], + tE] ■ [K] + [R] - .12 
ratio = Abun(W)/Abun(S) - 0.4074 



30 



35 



i |i /ratio) > 

1 2.454 

2 \ 6.025 

3 14.788 

4 36.298 

5 89.095 

6 218 .7 

7 536.8 . 



(ratio)' 
.4074 
.1660 
.0676 
.0275 
.0112 
4. 57- 10° 
1.86-10- 3 



fffrQP-free 
.9480 
.8987 
.8520 
.8077 
.7657 

.7258 

.6881 
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Table 10: Abundances obtained 
from various vgCodon , 
(continued) 



B. Unrestrained, optimized 





T 


C 


A 


6 


1 


.27 


.19 


. .27 


.27 


2 


.21 


.15 


.43 


.21 . 


10. 3 


.5 


.0 


.0 


' .5 



15 



'20 



25 



Amino 
acid 



"A 
D 
: F 
H 
K 
M 
P 
R 
T 

H. 
stop 



Abundance 



Amino 
frcjd 



4.05% ■ 

5.81% 

2.84% 

4.08% 

5.81% 

2.84% 

2.85% . 

6.83% 

4.05% 

2.84% lfaa 



C 
E 
G 
I 
L 
N 

Q 
s 



V 
Y 



5.81% 



Abundance 



2.84% 
5.81% 
5.67% 
.84% 
.83% 
.81% 
.08% 



2, 
6. 

s; 

4. 



6^89% mfaa 



5.67% 
5.81% 



[D] 



CE] 



0.1162 [K] + [R] = 0.1264 



30 



35 



40 



ratio 


« Abun (W) /Abun (S) 


= 0.41176 




i 


fl /ratio V j 


( ratio ) j 


stoo-free 


1 . 


2.4286 


.41176 


.9419 


2 


5.8981 . , 


.16955 


.8872 


3 


14.3241 


.06981 


.8356 


4 


34.7875 


.02875 


.7871 


5 


84.4849 


.011836 


.74135 


6 


205.180 


.004874 


. .69828 


7 


498.3 


2.007-10* 3 


.6577 
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.Table 10: -Abundances obtained 
from various vgCodon 
(continued) 



10 



C. Optimized NNT 





T 


C 


A 


1 


.2071 


.2929 


.2071 


2 


.2929 


.2071 


.2929 


3 


1. 


.0 .0 


■ .0 



.2929 
.2071 



15 



20 



25 



30 



35 



Amino 
acid 
A 
D 
F 
H 
K 
M 
P 
R 



w 



Abundance 



Amino 
' acid 



Ahnndance 



6 « 06% 
8.58% 
6.06% 
8.58% 
none 
none 
6.06% 
6.06% " 
4.29% lfaa 



none 



A 


(1 /ratio) 


i 


2.0- . 


.2 .; 


4.0 


3 


8.0 


4 ■: 


16.0 


5 


' 32 .0 


6 


,64.0 


7 


. 128.0 



a- 
E 


none, 


G 


6.06% 


I 


6.06% 


L 


8.58% 


N 


6.06% 


Q 


none 


S 


a.58% mfaa 


V 


8.58% 


Y 


6.06% 



(ration. 
.5 . 
.25 
.125 
.0625 
.03125 
.015625 
.0078125 



Htnn-free 
• 1. " ' 

1* 
. 1. 

1. . 
■ 1. • 
- 1. 

1. 
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Table 10 : Abundances obtained 
f rom •* various vgCodon 
(continued) 



D. Optimized NNG 



10 



1 
2 
3 



.23 

.215 

.0 



.21 
.285 



.23 
.285 

\0 



.33 

.215 

1.0" 



15' 



20 i! 



25 



Amino 
acid 



A 
D 

.F 
.. H- 
K 
M 
P 

- R' 
T 
W 

stop 



Abundance 



9.40% 

none . 

none \ 
'none 
.6.60% 
.4.90% 

6.00% 

9.50% - 

6.6 % 

a; 90% lfaa 



Amino 
acid 



C 

E' 

G 

I 

L 



N 

Q 
S 
V 
Y 



Abundance 



none 
9 . 40% 
7.10% 
none 

9 .50% mfaa 



none 

6.00% 

6.60% 

7.10% 

none 



6 . 60% 



1 (l /ratio) J 
30 1 1.9388- 

2 3.7588 

3 * . 7.2876 

4 14.1289 
• . 7.5 , 27.3929 
35 6 53.109 

7 •■' 102.96 ' : 



(ratio) 1 
.51579 
.26604 
. 13722 
.07078 
3.65-10 
1.88- 10" 2 
9.72 -10 



-2 



-3 



ai-nn-free 
0.934 
0.8723 
0.8148 
0 . 7610 
0.7108 
. 0.6639 
• 0.620.0 
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Table 10: Abundances obtained 
from optimum vgCodon 
(continued) 



E. Unoptimized NNS (NNK gives identical distribution) 



10 





T 


c 


A 


G 


1 


.25 


.25 


.25 


.25 


2 


. .25 


.25 


.25 


.25 


3 


.0 


.5 


.0 


0.5 



15 



20 



25 



30 



35 



Amino 



acid 


Abundance 


A 


6.25% 


D 


3.125% 


F 


3.125% 


H 


3.125% - 


K 


3.125% 


M 


3.125% 


P 


6.25% 


R 


9.375% 


T . 


6.25% 


W 


3.125% 


stop 


3.125% 


i 


M /ratio) ' 


i 


3.0 


2 


9.0 


3 


27.0 


4 


81.0 


5 


243.0 


6 


729^0 


7 


2187.0 



Amino 

a-cid 



C 

E 

G 

I. 

L 

N 

Q 

S 

V 

Y 



Abundance 



3.125% 
3.125% 
6.25% 
125% 
375% 
,125% 
.125% 
9 .375% 
6.25% 
3.125% 



3- 
9 
3 
3. 



(rfrtiP)' 

.33333 

.11111 

.03704 

. 01234567 

.0041152 
1.37- 10' 3 
4.57-10 - * 



atop -free 
.96875 
.9385 
.90915 
.8807 
.8532 
.82655 
.8007 
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Table 130,: Sampling, of a Library encoded by (NNK) 6 
A. Numbers of hexapeptides in each class 



5 



10 



total 


& * - 


- 64,000,000 


stop*- free 


sequences . 


a can be 


one 


Of [WMFYCIKDENHQ] 






4> can be 


one 


of [PTAVG] 








Q can be 


one 


Of "[SLR] 






. . • • 


ototatototot 


as 


2985984'. 






7464960. 


QofQfQfQfa , 


•= 


4478976 f 


**acxaa . 




7776000. 


QQotototot 




9331200. 


QQaaaia 




, 2799360. 






4320000. 


**0aaa 




7776000. 


<f>QQaaa 




4665600. 


QQQaaa 




933120. 






1350000. . 


***Qaa 




3240000. 






2916000; 






1166400. 


QQQQaor 




174960. 


**4>**q; . 




2250.00. 






67.5000. 


***QQaj 




810000. 






486000. 


*QQQQ0! 




145800. 




ss . 


17496. 






15625. 






56250. 






84375. 


«*M>QQQ 4 




67500. < 


**QQQQ 




30375. 


SQQQQQ 


S3 * 


7290. 


QQQQQQ 




729 . 



**QQaa, for example, stands for the set of peptides having 
two amino" acids from the a class, two from *, and two from 
£2 arranged in any order. There are, for example, 729= 3 
sequences composed entirely of S , L, and R. 

30 .. , , . ■,. ' .' . . 
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Table 130: 



B.. 



Sampling of a Library encoded by (NNK) 6 
(continued) 



Probability that any given stop- free DNA 
Sequence will encode a hexapeptide from a 
stated class. 



, otataiaiatoi . . . 
10 .'. ^aaaaa . . . 

Oaaatya . . . 

.**araaa. . . 

*Qaaaa. . . 

QQaaaa. . . 
15 . ***aora. . . 

S*Qaaa. . . 

OQCaaa. . . 

20 $*#Qaa. ... 
'-: *#QQarar. . . 

$QQQaa . 

OQQQaa. . . 

****#a. . . 
25 ****Oa. . . 

*S>*f}Qa. . . 

**QQQa. . . 
" *QQQQa. . . 

QQQQQa . .. . 

30 

**<£**Q. . . 
****QQ . . .. 
<£*#QQQ.. . . 
**QQQQ. 
35 *QQQQQ. 



3.364E-03 
1.682E-02 
1 .514E-02 
3.505E-02 
6.308E-02 
2.839E-02 
3.894E-02 
1.051E-01 
9.463E-02 
2.839E-02 
. 2.434E-02 
8-. 762E-02 
1.183E-01 
7.097E-02 
1.597E-02 
8.113E-03 
3. 65 IE- 02 
6.571E-02 
5..914E-02 
2.661E-02 
4.790E-03 : 
1.127E-03 
6.084E-03. 
1.369E-02 
1.643E-02 
1.109E-02 
3 .992E-03 
5.988E-04 



% of class 
(1.13E-07) 

' (2.25E-07) 
(3.38E-07) 
(4.51E-07) 

- (6.76E-07.) 
(1.0 IE -06) 
(9.0 IE- 07) 

' (1.35E-06) 
(2.03E-06) 
(3.04E-06) 
(1.80E-06) 
(2.70E-06) 

' (4.06E-06) 
(6.08E-06) 
(9.13E-06) 
(3.61E-06) 
(5.41E-06) 

' (8. HE- 06) 
(1.22E-05) 
(1.83E-05) 
(2.74E-05) . 
(7.21E-06) 
(1.08E-05) 
(1.62E-05) 
(2.43E-05) 
(3.65E-05) 
(5.48E-05) 

1 (8. 2 IE -05) 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued). 

Number of different stop-free amino-acid 
sequences in each class expected for various 
library sizes , 



Library size = 1.0000E+06 
total = 9.7446E+05 % sampled 



1.52 



Class 


Number 






Class 


aaaaaa . . . 


3362.6 ( 




.1) 


4aaaaa . . . 


Qaaaaa. . . 


15114.6 ( 




.3) 


44aaaa . . . 


4Qaaaa. . . 


62871.1 ( 




.7) 


QQaaaa. . . 


444aaa. . . 


38765. 7( 




.9) 


44Qaaa . . . 


4QQaaa* . . 


93672.7 ( 


2 


.0) 


QQQaaa. . . 


$***aa. . . 


24119 .9 ( 


1 


.8) 


4**Qaa! . . . 


44QQaa!. . . 


115915.5 ( 


4 


.0) 


4QQQaa! ". . . 


nnnQaa. . . 


15261.1 ( 


8 


.7) 


44444a . . . 


*4**Qa . . . 


35537.2 ( 


.5 


:s) 


444QQa . . . 


44QQQ0! . . . 


55684.4 ( 


11 


.5) 


*QQQQa.. . 


CQQQQa . . . 


4190. 6 ( 


24 


.0) 


444444. . . 


44444G . . . 


5767.0 ( 


10 


.3) 


4444QQ. . . 


444QQQ . . . 


14581. 7{ 


21 


.6) 


*4aaan 


4QQ0QQ . . . 


3073.9 ( 


42 


.2) 


fiQQQQQ . . . 



Number - % 

16803 .4 ( " .2) 
34967.8 ( ..4) 
28244 .3 ( 1.0) 

104432.2 ( 1.3) 
27960.3 ( 3.0) 
86442 .5 ( 2.7) 
68853.5 ( 5.9) 
7968.1 ( 3.5) 

63117.5 (■ 7.8) 
24325 .9 ( 16.7) 

1087.1 (< 7.0) 
12637.2 ( 15.0) 
9290. 2( 30.6) 
408.4 (56.0) 



Library size 



3.0000E+06 



30 



35 



40 



45 



■ total = 

aocaaocot . . 
CiotoiaoiQi . : 

QQQQiOtQi. . 
****aa . . 

onQQaa . . 

4><X>QnQa;. . 
GQGQQa. . 
WMt!. . 
***QQQ. . 



2 . 7885E+06 % sampled 



4.36 



10076 


• 4( , 




.3) 


*aaaaa . . . 


50296 


.9( 




7) 


45190 


.9 ( 


1 


.0) 


$$aaaa. . . 


104432 


.2 ( 


1. 


3) 


187345 


.5( 


2 


.0) 


DQaaaa. . . 


83880 


.9 ( 


3. 


0) 


115256 


-6( 


2 


.7) 


**Qaaa. . . 


309107 


-9( 


4. 


0) 


275413 


.9( 


5 


.9) 


QQQaora . . . 


81392 


.5 ( 


8. 


7) 


71074 


.5( 


5 


.3) 


<Wt"l>Qaa!. . . 


252470 


.2( 


7. 


8) 


334106 


.2 ( 


11 


.5) 


$QaQaa. . . 


194606 


• 9( 


16. 


7) 


41905 


• 9( 


24 


.0) 


*****a . . . 


23067 


.8( 


10. 


3) 


101097 


.3( 


15 


.0) 


***noa. . . 


174981 


.0( 


21. 


6) 


148643 


.7( 


30 


.6) 


*onnna. . . 


61478 


.9( 


42. 


2) 


9801 


.0( 


56 


.0) 


444444.,. . 


3039 


.6( 


19. 


5) 


15587 


• 7( 


27 


.7) 


4444QQ . . . 


32516 


.8( 


38. 


5) 


34975 


• 6( 


51 


• 8) 


4>»nnon. . . 


20215 


.5 ( 


66. 


.6) 


5879 


.9( 


80 


.7) 


QQQQQC2 . . . 


667 


.0( 


91. 


.5) 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 



Library size = 



1.0000E+07 



10 



15 



20 



total = 8.1204E+06 % sampled 



12 . 69 



Qaaaaa. 

QQGQaa . 
*4>QQQa . 



■ 33455.9 ( 
148871. 1 ( 
_6.099_8_7.. 6 ( 
372371.8 ( 
856471:6 ( 
222702.0 ( 
972324.6 ( 
104722.3 ( 
281976.3 ( 
342072.1 ( 
16364.0 ( 
37179.9 ( 
61580.0 (. 
7259 .5 ( 



1.1) 
.3.3) 
6.5) 
8.6) 
18.4) 
16.5) 
33.3) 
59.9), 
41.8) 
70.4) 
93.5) 
66.1) 
91.2)' 
99.6)' 



*aaaaa. 
QQata atai; 
QGQaaa . 
♦QQQaa. 

. 

**QQ0Q. 
QQQQQQ. 



166342 
-342685 
269958 



983416 
244761 
767692 
531651 

68111 
450120 
122302 
8028 

67719. 

295.86. 
'. 728. 



4 
7 
3 
4 
5 
5. 
3 

d 

2 
6 
0 

5( 
1 ( 



. 2.2) 
.4.4) 

12.6) 
26.2) 
23.7) 
45.6) 
30.3) 
55.6) 
'83.9) 
51.4) 
80.3) 
97.4) 



8 (100 .0) 



Library size = 



3.0000E+07 



25 : 


total = 


1.8633E+07 


% sampled - 29 


.11 










aoiouxaa. . . 


99247 


.4( 3 


.3) 


*aaaaa. . . 


487990 


.0( 


6 


,5) 




Qaaaaa . . . 


431933 


. 3 ( 9 


.6) 


**aara!a. . . 


983416 


.5( 


12 


.6) 




•SQaaaa . . . 


1712943 


.0( 18 


.4) 


QQaaatt ... 


734284 


.6( 


26 


.2) 


30 


*4>*aaa. . . 


1023590 


.0( 23 


.7) 


**Qaaa... . 


2592866 


.0( 


33 


.3) 




#nnaora . . . 


2126605 


.0( 45 


.6) 


OOQaaa! . . . 


. 558519 


.0( 


59 


.9) 




<£$>$3>aQ!. . . 


563952 


-6( 41 


.8) - 


<£>$$Qaa . . . 


1800481 


.0( 


55 


.6) 




<£$QQaa. . . 


2052433 


.0( 70 


.4) 


*QQQara. . . 


978420 


?5( 


83 


.9) 




QQQQdra!. . . 


163640 


.3 ( 93 


.5) 


$^*$$Q! . . .• 


148719 


♦ 7( 


66 


.1) 


35 


*<M>*Qo; . . . 


541755 


.7( 80 


.3) 


***QDa. . . 


738960 


.K 


91 


.2) 




#*J20Qa. . . 


473377 


.0 ( 97 


.4) 


*OQQQa. . . 


145189 


• 7( 


99 


.6) 




annoQa . . . 


17491 


.3 (100 


.0) 


• . . 


13829 


. 1 ( 


88 


.5) 




$****Q . . . 


54058 


.1( 96 


.1) . 


****0Q . . . 


83726 


.0( 


99 


.2) 




<£$$QQQ . . . 


67454 


-5( 99 


.9) 


**QQQQ . . . 


.30374 


.5(100 


.0) 


40 


*DQQQQ. . . 


7290 


.0 (100 


.0) 


QQQQQQ. 


729 


. 0 (100 


.0) 
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Table 130: Sampling of a Library encoded by (NNK) 

(continued) 



Library size 



7.6000E+07 



10 



15 



20 



25 



30 



35 



40 



total - 3.2125E+07 % sampled 



50.19 



ouxaaoiot . . . 
noraaofo; — 
$Qaaaa . . ■ 
444aaa . . . 
4QQaaa. . 
4444aa. . 
44QQaa .... 

4444Qa . .. 
44QQQa . . 
nanana. . 

44444Q . .'• 
444QJ2Q . . 



. 245057 
1014733 
3749112 
2142478 
3666785 
1007002 
2782358 
174790 
663929 
485953 
17496 
56234 
67500 
7290 



Library size 



.8( . 8.2) 
.0 ( 22.7) 
.0 ( 40.2) 
•0( 49.6) 
.61 78.6) 
.0 ( 74.6) 
.0 ( 95.4) 
.0( 99.9) 
.3 ( 98.4) 
.2 (100.0) 
.0 (100. 0) 
.9 (100.0) . 
.0 (100.0) 
.0(100.0) 

1.0000E+08 



4aaaaa. . . 
44aaaa . . - 
QQaaaa. . . 
44Qaaa . . . 
QQQaaa. ... 
444Qaa. ... 
4QQQaa. . 

44444a... 
444QQa. . . 

4QQQQa . . 
$$$$44 . '. 

4444QQ . . 

$40QQQ . . 
. QQQQGQ. . 



1175010. 
2255280. 
1504128. 
4993247. 

840691. 
2825063 . 
1154956. 
210475. 
80829 8. 
145799. 
15559 . 
8437.4. 
30375 
729 



0( 



15.7) 
29 .0) 
53 .7) 
64.2) 
90.1) 
87.2) 
99.0) 
93.5) 
99.8) 
.9 (100.0) 
.9 ( • 99 .6) 
.6 (100.0) 
.0(100.0) 
.0 (100.0) 



0 
0 
0 
9 
0 
0 

6( 
6( 



total - 

aaaaaa. . 
Qaaaaa . . 
4Qaaaa . . 
$$$aa!a . .. 
$nQaaa. . 
. $$$$a:a!. ', 
$$o,Qara . 
QQQQaa,. 
$$$4Qa . 
44QQQa . 
QQQOQa . 
44444Q .*. 
444Q0n . 
4QQQQQ . 



3 .6537E+07 % sampled 



57.09 



318185. 
1284677. 
4585163. 
2566085 . 
4051713 . 
1127473 . 
2865517. 
174941. 
671976. 
485997. 
17496 
56248 
67500 
7290 



1 ( 
0( 
0(. 
0( 
0 ( 
0( 
0( 



10.7) 
28 .7) 
49 .1) 
59.4) . 
86.8) 
83 .5) 
98.3) 
.0(100.0) 
.9( 99.6) 
.5 (100.0) 
.0 (100.0) 
.9 (100.0). 
.0 (100.0) 
.0(100.0) 



*aaaaa. . . 
$4aaaot. . . 
QQaaaa . . . 
MQofaa . . . 
QQQaaa. . 
44$Qaa. . 
$QQQaa. . 
44444a. . 
444QQa . : . 
4QQQGa . . 

4444QQ. . 
44QQQQ. . 
QQQOQQ : . 



1506161.0 ( 
2821285.0 ( 
1783932.0 ( 
5764391. 0( 
888584.3 ( 
3023170.0 ( 
1163743. 0( 
218886,6 ( 
. 809757;3( 
145800.0 ( 
15613:5 ( 
84375 . 0 ( 
-30375.0 ( 
729.0 ( 



20.2) 
3 6 **3) 
63.7) 
74.1) 
9S.i"2) 
93.3) 
99.8) 
97.3) 
100.0) 
100.0) 
99.9) 
100.0) 
100.0) 
100.0) 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 



10 



15 



20 



Library size = 



3.0000E+08 



total = 

; ouxciaicigtJ: . 
Oaaorao;. . . 
- *t2aofaqf .-. . 

•SQQarara. . 
$$$$aa!. .-. . 
**QQa!a! . . . 
nonnaa: 

$$<£<$Qa. .... 

**anoa . . . 
nnonoa . . . 
. .. . 

***QS2Q . . . 



5.2634E+07 % sampled 



82 . 24 



856451 
2854291. 
8103426. 
4030893. 
4654972. 
1343954. 
2915985. 
174960. 
674999. 
486000. 
17496. 
56250. 
67500. 
7290. 



3( 
0( 
0( 
0( 
0( 



28.7) 
63.7) 
86.8)- 
93.3) 
99.8) 
0 ( 99 .6) 
0(100.0). 
0(100.0) 
9 (100.0) 
0 (100.0) 
0 (100.0) 
6 (100 .0) 
0 (100.0) 
0 (100.0) ; 



QCKXQUXCi. . 
Waaaa . . 
CQaaaa . . 
$$Qaaa. . 
QQQaioca. . 
***Gaa. . 
SQQQoea. . 
*****a. . 

*QQQQa. ..; 
<M>$$$3> J ... , 
****QD. '■■ . 

QQQDQO. . . 



3668130. 
5764391. 
-2665753, 
7641378. 

933018. 
3239029. 
1166400. 
224995. 
810000. 
145800. 
15625.. 
84375. 
30375. 
.729. 



0 ( 49.1). 
0( 74.1) 
0( 95.2). 
0 ( 98.3) 
6(100.0) 
0 (1.00.0) 
0(100.0) 
5 (100.0) 
0 (100.0) 
0 (100.0) 
0 (100.0) 
0 (100.0) 
0 (100.0) 
0 (100. 0) 



25 



30 



35 



40 



Library size = 



1.0000E+09 



total = 6. 1999E+07 % sampled 



96.87 



aaaaaa . 
Oaaaaa. 

SQQaaa. 

**QQaa . 
QaOQaa . 

*GQQQQ . 



2018278. 
4326519. 
9320389. 
4319475. 
4665600. 
1350000. 
2916000. 
174960. 
675000. 
486000. 
17496, 
56250. 
67500. 
7290. 



0 ( 67.6) 
0( 96.6) 
0( 99.9) 
0 (100.0) 
0 (100.0) 
0(100.0) 
0 (100.0) 
0 (100.0) 
0 (100.0) 
0(100.0) 
0 (100.0) 
0(100.0) 
0 (100.0) 
0(100.0) 



**araara. . 
QQaaaa. . 
**oaaa. . 
QQQaara. . 
***Qaa . . 
*QQaaa. . 

*QQGQa. . 

. 

QQQOQQ. . 



6680917. 
7690221. 
2799250. 
7775990. 

933120. 
3240000. 
1166400. 
225000. 
810000. 
145800. 
' 15625. 
84375. 
30375. 
729 . 



0( 89.5) 
OX 98.9) 
0(100.0) 
0 (100.0) 
0 (100.0) 
0 (100.0) 
0 (100.0) 
0 (100.0) 
0 (100. 0) 
0 (100.0) 
0(100.0) 
0 (100.0) 
0 (100.0) 
0(100.0) 
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Table 130: Sampling of a Library encoded by (NNK) 6 

. (continued) 



Library size = 



3 .0000E+09 



10 



15 



20 



total > 

otouxaaat. 
Gaaaaa. 

!*>**aaa . 

♦OQaaa 

*##*aa. 

**QQara . 

QOOQara. 

****Qa. 

QQQfiQa. 
*QQQQfi . 



6.3 890E.+07 % sampled = 99.83 



. 2884346 
. 4478800 
. 9331200 
. 4320000 
. 4665600 
.' 1350000 
. 2916000 
. 174960 
I 675000 
. 486000 

17496 
. 56250 

67500 
7290 



6( 96.6) 
0(100.0) 
0 (100.0) 
0 (100.0) 
0 (100.0) 
0 (100.0) 
.0 (100.0) 
,0 (100.0) 
.0 (100.0) 
.0 (100.0) 
;0 (100.0) 
.0 ( 100.0) 
,0 (100.0) 
.0 (100.0) 



Quactqia '. 
i&aataot . 
QQaaaa . 
<S>*Qotaa . 
, QQQaaa . 

«QQQaa . 

*QQQQa. 
■ . 
*4><M>QQ. 

QQC2QQQ . 



7456311 
7775990 
2799360 
7776000 
933120 
3240000 
1166400 
225000 
,810000 
145800 
15625 
, 84375 
'.- 30375 
729 



,0( 99.9) 
id (100.0) 
.0 (10 0.0) 
.0 (100.0) 
.0 (100.0) 
.0 (100.0) 

. 6(.ioo . d ) 

.0 (100.0) 

.6 (ioo. o) 

.0 (100.0) 
.0 (100.0) 
.0(100.0) 

:o (loo.o) 

.olio d.o) 
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Table 130, continued 
D . Formulae for tabulated quantities . 

5 Lsize is the number of independent- transf ormants . 
31**6 is 31 to sixth power ; 6*3 means 6 times 3 . 
A = Lsize/ (31**6) ; "l 
a can be one of [WMFYCIKDENHQ . ] .«• ■ , ... . , 

* can be one of tPTAVG] , - . ' " : 

10 . '-. 01 can^be one ; of [SLR] : r i • • _ 

F0 > (12)*.*6 Fl = (12)**5- • F2 = (12) **4 \ .. • , . 

F3 •= (12) **3 F4 - (12)**2 F5 =. (12) 

. F6 = l : . ; r •.■_■/.:/.■_■• 

15* ' oratifaaa = FG * (1-exp ( -A) ) -v. ' '. 

•Saaaao! = 6 * 5 * Fl * (l-exp(-2*A) ) . . V 
Qaaaad - 6 * 3 * Fl *. (1- exp ( - 3*A) ) '• . . 
• . **aaaa! = (15)" * 5**2 * F2 * • (i-exp(-4*A) j 
*Qaraaa = (6*5) *5*3 *F2 * (i-exp (-6*A) ) 
20 . aqararaa » (15) * 3**2 * F2 * (1-exp ( -9*A) ) ; : , v 
***aa!a! =' (20) * (5**3)' *' F3 * . (1-exp (-8*A) ) 
. **£2aaQ! = (60) * (5*5*3) *F3* (1-exp (-12*A) ) 
• *nDaoro! = (60) * (5*3*3) *F3* (l-exp(-18*A) ) 
. QQQaarar = (20) * (3) **3*F3* (1-exp (-27*A) ) 
25 ****aa = (15) * (5) **4*F4* (1-exp ( -16*A) ) 

***Qaa. = (60) * (5) **3*3*F4*(l-exp(-24*A) ) 
: **QQaof = (90) * (5*5*3*3) *F4*(l-exp(-36*A)) 
*QQDaa = (60) * (5*3*3*3) *F4* (l-exp(-54*A) ) 
QnQQaor = ( 15 ) * ( 3 ) **4 * F4 * ( 1 - exp ( - 81 *A) ) 
30 *****a = (6) * (5) **5 * F5 * (l-exp(-32*A) ) 
****na '=. 30*5*5*5*5*3*F5* (1-exp (-48*A) ) 
***nDa = 60*5*5*5*3*3*F5* (1-exp (-72*A) ) 
**QQQa = 60*5*5*3*3 *3*F5* (1-exp (-108*A)) . 
*nnODa = 30*5*3*3*3*3*F5* (1-exp (-162*A)) ■ 
35; : QQQQQa = 6*3*3*3*3*3*F5* (1-exp ( -243 *A) ) 
****** = 5**6 * (1-exp (-64* A) ) • 
'.' *****Q = 6*3*5**5* (1-exp (-96*A)) . .. „• .' 

. ****QQ = 15*3*3*5**4* (1-exp (-144*A) ) 
***npC = 20*3**3*5**3* (1-exp (-216*A) ) 
40 **QQQQ = 15*3**4*5**2* (1-exp (-324*A) ) 
*nnnC0 = 6*3**5*5* (1-exp (-486*A) ) 

" .. qqqqqq = 3**6* (i-exp(-729*A) ) 7 . . " ' 

total = aaaaoro! + *aaaaa + Qaaaaa •+ **aaaa! '+ *Qaaaa +' 

QOaaaa + ***aaa +.**Daaa + *QQaara! + QQQaaa + 

45 ****QfQ! + ***Qaa + **QQaa + *QQQaa + QOriaaa + 

*****a + ****Qa + ***QQa + **QOQa + *0pQQa. + 

nonnna + ****** + *****n + ****ao + ***qqq + 

**nQQn + *nnnnQ +.£jooqqc2 
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Table. 131: Sampling of a Library 

Encoded by (NNT) 4 (NNG) 2 ^ 

X can be F^^^/L^/H^R/I^T^^^A^,^ 
r can be L 2 ,R 2 ^s^w, p, QvM^^.K, v, a, e;g ; 

Library comprises 8.55-10 6 amino- acid sequences; 1*47* 10. 7 DNA 
sequences. 

Total number of possible aa sequences,* 8,555,625 



x 
S 

e 



LVPTARGFYCHIND 
S 

VPTAGWQMKES 
LR 



20 



25 



30 



35 



40 



The first, second, fifth, and sixth positions 
can hold x or S;- the third and fourth position can hold, 8 or 
Q. I have lumped sequences by the number of xs, Ss, Qs, and 
fls. . , • . * \ ' 

For. example xxGQSS stands for: 

[xxGQSS, xSGQxS, xSGQSx, SSGQxx, SxQQxS, 
SxGQSx, 

xxQGSS, xSQGxS, xSQGSx, SSQGxx, SxQGxS , SxQGSx] 

The following table shows the likelihood that 
any particular DNA sequence , will fall into one of the 
defined classes. 1 



Library size 



total . . . ♦ 
xxGGxx* • • 
xxQQxx. . . 
xxGQxS . ... 
xxGGSS*. . • * 
xxQQSS . ,. . 

xseoss. . . 

SS6GSS ♦ • • 
SSOQSS . 



1.0 

1.0000E+00 
3 .1524E-01 
4.1684E-02 
1.3101E-01 
3.8600E-02 
5.1042E-03 
2.6736E-03 
1.3129E-04 
1.7361E-05 



Sampling - .00001% 



%sampled. . .'. . 
xxGQxx. ...... 

3Oc60scS ....»..-. 

xxQQxS. . . . . . . 

xxGQSS. •«•••• 

xSGGSS • * • • • • 

"xSQQSS. . . 

ssenss. . . 



1.1688E-07 
2.2926E-01 
1.8013E-01. 
2.3819E-02 
2.8073E-02 
3 .-6762E-03 
4. 8 6 HE -04 
9 ;S486E-05 



45 
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Table 131: Sampling of a Library. 
Encoded by (NNT) 4 (NNG) 2 
(continued) 

5 The following sections show how many sequences 

of each class are expected for libraries of different sizes. 



10 



15 



20 



25 



30 



35 



40 



Library size = 



1.0000E+05 



total , 
T ype 



9.9137E+04 
Number ■ '■ ■ - if 



xx86xx. . 
xxQQxx. . 
xxGQxS . . 

xxeess. . 

xxQQSS . . 
xSOQSS. .. 

sseess.-. 

SSQQSS 



Library size = 



31416.9 ( .7) 
4112.4 ( 2.7) 
12924,6 ( 2.7) 
3808.1 ( 2.7) 
483.7 ( 10.3) 
: 253.4 ( 10.3) 
12.4 ( 10.3) 
. : 1.4 ( 35. 2) 

1.0000E+06 



fraction sampled = 1.1587E-02 
Type Number % 



xx6Qxx. . 

xxeexs . . 

xxGQxS. . 

xxenss . . 
xseess. . 
xsonss . . 
ssenss. . 



22771.4 ( 
17891.8 ( 
2318.5 ( 
2732.5 ( 
357.8 ( 



1.3) 
1.3) 
5.3) 
5.3) 
5.3) 



43.7 ( 19.5) 
8.6 ( 19.5) 



total. . 
XX00XX. 

xxQQxx. 
xxGQxS. 

xxeess. 

xxQQSS . 

xseoss . 
sseess. 

SSQQSS . 



9..2064E+05 fraction sampled = 1.0761E-01 

304783 .9 ( , 6.6) xxGOxx. . . . . 214394.0 ( 12.7) 

36508.6 ( 23.8) xxeexS 168452.5 ( 12.7) 

114741.4 ( 23.8) xxQQxS . . ... 18383. 8 X 41.9) 

33807.7 ( 23 .8) xxGQSS . 21666.6 ( 41.9) 

3114 . 6 ( 66.2) xSeeSS. ... . ' 2837.3 ( 41.9) 

1631.5 ( 66.2) XSQQSS 198. 4( 88.6) 

80.1 ( 66.2) SSeQSS..... 39.0 ( 88.6) 
. 3.9 ( 98.7) 



Library size = 



3.0000E+06 



total . . 
xxeexx. 

xxQQxx. 
xxOQxS . 

xxeess . 

xxQQSS . 

xseoss. 
sseess. 

SSQQSS . 



2.3880E+06 fraction sampled = 2.7912E-01 

855709 .5 ( 18 .4) xxeQxx. . . . . 565051.6 ( 33.4) 

85564.7 ( 55.7) xxeexS. ... . 443969:i( 33.4) 

268917.8 ( 55.7) xxQQxS. . 35281.3 ( 80.4) 

79234.7 ( 55.7) xx6QSS. . . . . 41581.5 ( 80.4) 

4522.6 ( .96.1) xSeeSS. 5445.2 ( 80.4) 

2369.0 ( 96.1) XSQQSS..... 223. 7( 99.9) 

116.3 ( 96.1) -SSBQSS. . - . « * 9 ( 99.9.) 
4.0 (100.0) 



WO 92/15677 



PCT/US92/01456 



117 

Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 
(continued) 



10 



15 



Library size = 



8.5556E+06 



total.. 4.9303E+06 fraction sampled = 5.7626E-01 

XX90XX 2046301.0 ( 44.0) xxBQxx. 1160645.0 ( 68.7) 

xxQQxx 138575.9 ( 90.2) xx99xS..... 911935.6 ( 68.7). 



xx9DxS 435524.3 ( 90.2) xxQQxS , 



xxeess. 

xxQQSS . 

xsenss 
sseess 

SSQQSS. 



128324.1 ( 90.2) xxBQSS . 

4703.6(100.0) XSG9SS. 

2463.8(100.0) XSQQSS. 

121.0(100.0) SS6QSS . 
4.0(100.0) 



43480.7 ( 99.0) 
51245.1 ( 99.0) 
6710.7 ( 99.0) 
224.0(100.0) 
44.0(100.0) 



20 



25 



Library size = 



1.0000E+07 



total 5:3667E+06 fraction sampled = 6;2727E-01 

xxeexx. . ... . . 2289093. 0( 49.2) xxSQxx 1254877.0( 74.2) 

XXQOXX . 143467 .0 ( 93 . 4 ). xx66xS . 985974 . 9 ( 74 . 2 ) 



xxBQxS 450896.3 ( 93.4) xxQQxS 



xx99SS . 
xxQQSS . 
XS9QSS, 

sseess. 

SSQQSS. 



132853.4 (93.4) xxGOSS. 

4703.9(100.0) xSBBSS. 

2464.0(100.0) xSQOSS. 

121.0(100.0) SSGQSSi 
4.0 (100.0) 



43710.7 ( 99.6) 
51516.1 ( 99.6) 
6746. 2( 99.6) 
224.0 (100.0) 
44.0 (100.0) 



30 



35 



Library size « 



3 .0OOOE+07 



total 7.8961E+06 fraction sampled = 9.2291E.-01 

xxBOxx 4040589 .0( 86.9) xxOQxx 1661409.0 ( 9.8.3) 

xxQQxx 153619 .1(100.0) xx89xS 1305393.0 ( 98.3) 



3Oc80xS 482802.9(100.0) xxOQxS . 



XX99SS . 
xxQQSS. 
xS9nSS. 

sseess 

SSQQSS . 



142254.4(100.0) xxOQSS. 

4704.0H00.0) xseess, 

2464.0(100.0) xSQQSS. 

121.0(100.0) SS6QSS, 

4.0 (100.0) • 



43904.0 (100.0) 
51744.0 (100.0) 
6776:0(100.0) 
224.0 (100.0) 
44.0 (100.0) 
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Library size 



Table 131: Sampling * of a Library 
Encoded by, (NNT) 4 (NNG) 2 
(continued) 

5.0000E+07 



8 .395.6E+06 . fraction sampled = 9 . 8130E-01 



total . it . . . . 

xxG9xx* - .. . . 4491779 . 0 ( 96.6) xx60xx. . 

xxQGxx... . 15 3 663 /8 (100 . 0) xx98xS . . 

10 J - . xx9QxS. . . *. V 4 8 2 9 4 3 . 4 (10 0.0) . xxQQxS . . 

xx99SS. : . . . . 142295 . 8(100 . 0) xxBQSS . . 

xxonss. . . . 4704.0(100.0) xseess . . 

xSenSS . . . . ; ; • 2464 . 0 (100 .0) xSQQSS . . 

sseess. . . '. . 121I0 (ioo.o) ssooss. . 

15 SSQQSS. . . . . • 4.0 (100.0) 



1688387.0 ( 99 .9) 
. 1326590.0 ( 99.9) 

L. .43904.0X10 0.0): 
51744.0 (10 0.0) 
. 6776.0 (100.0) 

224.0 (100.0) 
44.0(100.0) 



Library size 

. ' total . 
20 ; xx98xx. . ;.\ 

xxQQxx. ; . 

xx9QxS. . . 

XX89SS. . . 

xxQGSS . . . 
25 XS90SS. . . 

SS99SS... * 

SSQQSS . . . 



» 1.00QOE+08 
8 .5503E+06 fraction sampled = 9.9938E-01 



4643063.0 ( 99.9) xxGQxx. . ; . 

153664 . 0 (100.0) xxOexSv . . . 

482944. 0 (100.0) xxQQxS . . 

142296.0 (100.0) xxGQSS. ... 

4704.0 (loo.o) xseess. . . . 

'2464.0(100.0) xSQQSS . . . . 

121.0 (100.0) SS9QSS 

4.0(100.0) •' 



1690302.0 (100.0) 
1328094.0 (100.0) 
43904.0 (100. 0) 
51744.0 (100.0) 
6776.0 (100.0) 
224:0(100.0) 
.44.0 (100.0) 



WO 92/15677 . 



PCT/US92/01456 



, 119 , 

Table 132:. Relative efficiencies of 
various simple variegation codons 



vaCodon 



10 



Number of codons 



. #DNA/#AA 
[#DNA] 



#DNA/#AA 
[#DNA] 



#DNA/#AA 
[#DNA] 



15 



NNK ; ' . 
assuming 
stops vanish 

NNT . 



8.95 13 .86 21.49 

[2.86- iO 7 ] [8.87;10 8 ] .[2 * 75 • 10 10 ] ; C 
(3 . 2-10 6 ) : (6.4-10 7 ) (1.28-10 9 ) ' 

• 1.38 • 1.47 1.57 

[1. 05 •106]- [1.68-10 7 ] [2.68-10 8 ] 
(7.59 -10 s ) (1.14-10 7 ) (1.71-10 8 ) .: 



20 



nng ; 

assuming 
stops vanish 



,2.04 
[7.59 -10 s ] 
(3.7-10 5 ) 



2 .36 . 2.72 
[1.14-10 6 ] [1.71 -10 s ] 
(4 . 83-10 6 ) (6 . 27-10 7 ) 
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. - . Table 155 '■ " ' 

Distance in A between alpha carbons in octapeptides : 



5 Extended Strand: angle of C^i-C^-C^ - 138° 







1 


2 


3 


4 


■ ■ 5 


6 


- 7 




1 


















' .2 ' 




: - 3 . 8 












-10 -"- 


3 
4 


10.7 


, ..: 7^.-1 
7:i 


• 3. 8' 
3.. 8 












5 


14.2 


10.7 


7.1 


3.8 










• 6 


17.7 


14.1 


10.7. 


7.1 


3.8 








7 


21.2 


17 . 7 


14.1 


10.6 


7.0 


3.8 




15 


8 


24.6 


20 . 9 


17.5 


13.9 


10 . 6 


7.0 


3. 8 



Reverse turn between residues . 4 and 5 . 





1 ' 


2 


'. 3 ' 


4 


5- 


6 


7 


1 
















2 ' 




3.8 












3 




7.1 


3.8 










4 ' 


10. 6 


7.0 


3.8 










5 ' 


11.6 


8.0 


6.1- 


3.8 








6 


9.0 


5.8 


5.5 


5.6 


3.8 






7 


6.2* 


4.1 


6.3 


8.0 


7.0 


3.8 




8 


5.8 


6.0 


9.1 


11.6 


10.7 


7.2 


3.8 



30 . ' 

Alpha helix: angle of C^l-C^-C^ = 93° 





1 " 
















35 


2: 

. :' 3 • 

4 ' - 


5.1 


3.8 
5.5 
5.4 


3 ,8 
3.8 












5' ' 


6.6 


5.3 


5.5 


3.8 










6 


9.3 


7.0 


5.6 


5.5 


3.8 






40 


7 


10.4 


9.3 


6.9 


5.4 


5.5 


3.8 






8 


11.3 


. 10 .7 


9 .5 


.6.8 


5.6 


5.6 


3.8 
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Table 156 ... 



Distances between alpha carbons in closed mini -proteins of 
the form disulfide cyclo (CXXXXC) 



Minimum distance 



1 












2 


3.8 










3 


5.9 


3.8 








4 


5.6 


6.0 


3.8 






5 


4.7 


5 . 9 


6.0 


3.8 




6 


4.8 


5.3 


5.1 


5.2 


3 . 8 



20 Average distance 



"1 

2 3.8 

25 3 6.3 3.8 

4 7.5 6.4 3.8 

5 7.1 7.5 6.3 3.8 



6 5.6 7.5 7.7 6.4 3.8 



Maximum distance 



•2. 



35 .. 


" 1 
2 

3 ■ 


3.8 
6.7 


3.8 










4 


9.0 


6.9 


3.8 








5 


8.7 


8.8 


6.8 


3 : 8 




40 


6 


6.6 


9.2 


9.1 


6.8 


3.8 
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Table 820: Peptide Phage 



Putative Streptavidin 
5 Name Binding Peptide Seq. 



Antibiotic 
Resistance 
- Marker 



HPQ A EG P C H P Q F - - C Q S Y I E G R; I V - ..- - - . E. . . 

DEV(P) A "E - P G H P Q Y R L C Q R P L K Q 'P P P P P P A E. . 

Dev (E) A E - .L; C H P Q. F P R; C N L F, R -K > ,P p P P P P A E ; 

10 HPQ 6 A E G P C H P Q F PRC Y I E G R I V. \ - - - - . - . E . . ." 



15 



1 2 3 4 5 

- - - r C 



1 1 1 1 1. 1 I'l l 1 2 2 2 2 2 2 2 
6 7 89 0 12 3 4 5 6:7 8 9.0 1 2 3 4 5 6 

----- - _ • .--cr - - :- - . .i.. - - e 
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Table 838: Streptavidin-binding 
disulfide-constrained peptides 

clone alu alv V cvs V V V V c vs V ser Freqw^CY 

5 #2 glu gly tyr cys his pro gin phe cys pro ser 4 

#4 1 glu gly his cys his pro gin phe cys ser ser . 3 

#5 glu gly leu cys his pro gin phe cys gly ser 2 

#8 glu gly asp cys his pro gin phe cys ser ser .2 

#1 glu gly asn cys his pro gin phe cys pro ser 1 

10 #3 glu gly asp cys his pro gin phe cys arg ser 1 

#13 glu gly asp cys his pro gin phe cys val ser l 

cys his pro gin phe cys consensus 

Table 839: Sequences Obtained by 
is Enrichment over BSA 

clone aiu aiv v cvs v v v v cvs v ser Frequency 

#21 glu gly gly cys phe lys arg asn cys tyr ser 1 

#22 glu gly his cys asp lys lys ile cys leu ser 1 

20 #23 glu gly phe cys his thr ala ala cys phe ser 1 

#24 glu gly his cys tyr lys gly val cys ser ser 1 

#25 glu gly his cys asp lys trp arg cys pro ser 1 

#26 glu gly ile cys tyr arg leu asp cys ile ser 1 

#27 glu gly gly cys phe pro trp his cys phe, ser 1 

25 #28. glu gly ser cys asp ser leu arg cys asp ser 1 
No consensus observed. 



10 
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CLAIMS 

l. In a process for developing novel binding proteins 
with a desired binding activity against a particular target 
material comprising providing a population of genetic packages, 
5 each displaying one or more copies of a particular, potential 
binding domain as part of a chimeric outer surface protein 
thereof, said potential binding domain not being natively 
associated with the outer surface of said package, said 
population collectively displaying a plurality of different 

10 potential binding domains, the differentiation among said 
plurality of different potential binding domains occurring 
through the at least partially random variation of one or more 
predetermined amino acid positions, but hot all amino acid 
positions, of said parental binding domain to randomly obtain 

15 at each said variable position an amino acid belonging to a 

predetermined set of two or more amino acids, the amino acids 
of said set occurring at said position in predetermined 
expected proportions; contacting the packages with the target 
material; and separating the packages according to their 

20 affinity for said target material; 

the improvement comprising essentially each said 
potential binding domain being a mini -protein sequence of less 
than forty amino acids and having at least one intrachain 

25 covalent crosslink between at least a first amino acid position 
and a second amino acid position thereof, the amino acids at 
said first and second positions being invariant in all of the 
chimeric proteins displayed by said population, with those 
residues which participate in the formation of a covalent 

30 crosslink being invariant throughout said, population, with the 
proviso that when the crosslink is in the form of a disulfide 
bond, the potential binding domain is a micro- protein sequence 
of less than forty amino acids* 
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2 ,, The method of claim 1 wherein the crosslink is a 
disulfide bond and the the amino acids at the first and second 
amino acid positions are cysteines. 

3. The method of claim 2 in which the micro-protein 
5 domain has a single disulfide bond arid the span of the bond is 
not more than nine amino acid residues. 

■• ^ .-Tne 'method of . claim i2 in which the micro- protein 

domain has a single disulfide bond, wherein the disulfide bond 
bridges a sequence of amino acids which under affinity 
10 separation conditions collectively assume a hairpin 
super secondary structure. 

5. The method of claim 4 wherein the hairpin '• 

\ secondary structure is selected from the group consisting of 
(a) an or helix, a turn, and a 0 strand; (b) :an a helix, a turn, 
15 and .an a helix; and (c) a 0 strand, :a turn, and a 0 strand. 

6. The method of claim 2 wherein the micro-protein 
domain comprises . two intr^chain disulfide bonds and preferably 
includes i two clustered cysteines. 

7. The method of claim 6 wherein the micro-protein 

20 domain has two disulfide bonds having a connectivity pattern of 

■ 1-3, 2-4. v . . ' y '' V- ■ : '. 

8 . The method of claim 2 wherein the micro-protein 
domain comprises three intrachain disulfide bonds and 
preferably includes two ciustered cysteins. 
25, 9 . The : method of claim 8 wherein the micro-protein 

domain has three disulfide bonds having a connectivity pattern 
Of 1-4, 2-5, 3-6l ' " • .". " 

10 . The method of claim 7 wherein the micro-protein 
domain substantially •corresponds in sequence to an a-conotoxin. 
30 ^ li . The method of claim 9 wherein the micro - protein 

domain substantially corresponds in sequence to a mu- or omega - 

conotoxin. . ._ : ' 

12. The method of claim 6 wherein the micro-protein 
domain substantially corresponds in sequence to a micro-protein 
35 selected from the <jr-™ip ^riria-i ing of Escherichia coli heat 
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stable toxin I (ST X ) , the bee venom apamin, or a squash- seed 
trypsin inhibitor, the scorpion toxin, charybdotoxin and 
secretory leukocyte protease inhibitor. 

13 . The method of claim 1 wherein the covalent 
crosslink includes a metal atom, such as zinc, iron, copper or 
cobalt . 

14. The method of any of claims 1-13 wherein at least 
one variable amino. acid position in said potential binding 
domains was encoded by a simply variegated codon selected from 
the group consisting of NNT, NNG, RNG, RMG, VNT, RRS, and SNT. 

15. The method of any of claims 1-13 wherein none of 
the variable amino acid positions in said potential binding 
domain was encoded by. a simply variegated cddoh selected from 
the group consisting of NNN, NNK and NNS. 

. 16. The method of any of claims 1-13 wherein' at least 
one variable amino acid position in said potential binding 
domains was encoded by a complexly variegated codon. 

17. The method of any of claims 1-16 wherein the 
replicable genetic package is a phage, preferably a DNA phage 

0 . other than phage lambda, more preferably a filamentous phage. 

18. The method of claim 17 wherein the potential 
binding domain is fused with the major coat protein of a 
filamentous phage or a assemblable fragment thereof, or ^ with 
the gene III protein of a filamentous phage or an assemblable 

5 fragment thereof,. 

19.. The method of any of claims 1-16 wherein the 
replicable genetic package is a bacterial cell, such as strains 
of Eacherichia coll, Salmonella typhjjpurium, gBeudomc-nas 
aeruginosa . Klebsiella pneumonia. Neisseria gpnc-yrhpeae , or 

0 Bacillus subtilis . said DNA construct further comprises a 
periplasmic secretion signal sequence, and the potential 
binding domain is fused with a bacterial outer surface protein 
such as the lamB. protein, OmpA, OmpC, OmpF, Phospholipase A, or 
pilin, or an assemblable segment thereof . 
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20. The method of any of claims 1-19 wherein said 
population is characterized by the display of at least 10 5 
different potential binding domains, and wherein, for any 
potentially encoded potential binding domain, the probability 
5 that it will be displayed by at least one package in said 
population is at least 50%, more preferably at least 90%. 
, 21. A library of display phage or cells, each 
- displaying one or more copies of a particular potential binding 
domain as part of a chimeric outer surface protein thereof, 
. 10 said potential binding domain not being natively associated 
with the outer surface of said phage or cells, said library 
collectively displaying a plurality of different potential, 
binding domains, the differentiation among said plurality of 
different potential binding domains occurring through the at 
15 least partially random variation of one or more predetermined. 

amino acid positions, but not all amino acid positions, of said 
parental binding domain to randomly obtain at each said 
variable position an amino acid belonging to a predetermined 
set of two or more amino acids, the amino acids of said set 
20 . occurring at said position in predetermined expected 
proportions, 

essentially each said potential binding domain being a mini- 
*• protein sequence of less than sixty amino acids and having at 

25 least one intrachain covalent crosslink between at least a 
first, amino acid position and a second amino acid position 
thereof, the amino acids at said first and second, positions 
being invariant in all of the chimeric proteins displayed by 
said population, with those residues which participate in the 

30 formation of a covalent crosslink being invariant throughout 

said population, with the proviso that when the crosslink is a 
disulfide bond, the potentiial binding domain is a micro-protein 
of less than 40 residues. 
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